10 Points to Include in Your Disaster Recovery Plan
Last Updated on
Does your business have a disaster recovery plan?
Data loss, downtime, and tech outrages are some of the new horror stories that even the top companies come across nowadays. Whenever a disaster strikes in a company, the engineering teams rush to repair the damage, and on the other hand, PR teams work overtime to restore customer confidence. Don’t you think it’s a time-consuming and an expensive effort? Of course, it is! But there are organizations that manage these disasters most effectively and that too with less collateral damage. Wondering how? Simple, they have a comprehensive, easy-to-follow, and regularly tested disaster recovery plan.
Disasters come uninvited with loads of complex challenges, which organizations might take months or years to overcome. A disaster plan is a long-term assurance of business operability as it is designed in such a way that it enables businesses to reduce damages of unpredicted outages.
Do you have a disaster recovery plan or are you just beginning the process of creating one for your organization? In either of these cases, the disaster recovery plan checklist below will help you add all the crucial components in your plan.
1. Fix the Disaster Recovery Objectives
Disaster recovery helps you keep your business operating as usual, constantly, so you need to fix on IT services that are most critical to run your organization, and on the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) required for these services/ machines. But are you aware of RTO and RPO?
RPO: The amount of time required to recover from a disaster after notification of business disruption. In case of any disaster if your business is not able to withstand at least an hour of downtime without losing customers to your competitors, then it’s crucial. You need a reliable disaster recovery plan that comprises of a clearly-stated allowed RTO.
RPO: A window of time in which data is acceptable. After a disaster strike, if your business can only survive a data loss for four hours after a full day of business, this can lead to a catastrophic loss of important data, so your RPO would be four hours.
An organization’s RTO and RPO is sure to affect its recovery strategy and associated expenses. In order to reduce the total cost of the disaster recovery strategy, it is better to divide the applications into tiers. The highest tier reserved for mission-critical applications would require a disaster recovery technology based on real-time continuous data replication, mid-level tier might require a snapshot-based application and finally, the lowest tier may get by with a simple file-level backup system.
2. Recognize the Stakeholders in Your Disaster Recovery Plan
Next and crucial step is to identify those who need to be updated once disaster strikes. Engineers, support, executives etc. will be involved in performing the actual disaster recovery, but you need to identify others too like vendors, members of PR and marketing team, third party suppliers and key customers. Most of the companies maintain a register of stakeholders in their project office documentation to notify on the case of a disaster.
3. Gather Entire Infrastructure Documentation
When a disaster occurs everything goes for a toss, everyone is under pressure. Indeed, you have your engineering teams with the required skills and knowledge to activate disaster recovery procedures, but infrastructure documentation is mandatory. Even the highly proficient engineers while performing disaster recovery would prefer to go command by command from infrastructure documentation.
So what does this documentation comprise of? The entire setup of systems and their usage (installation, recovery procedures, applications running, OS and configuration), cloud templates, storage and databases (how and where the data is saved, how backups are restored, how the data is verified for accuracy) and all your mapped network connections (with functioning devices and their configuration).
4. Cherry-pick the Precise Technology
Disaster Recovery as a Service (DRaaS) and on-premise disaster recovery is not just the feasible solutions available for business continuity. The next option is to make use of cloud-based disaster recovery in order to spin up your disaster recovery site on a public cloud-like Microsoft Azure, AWS and Google Cloud in minutes using an automated disaster recovery solution.
Before you make a choice of solution, ensure to consider the total cost of ownership, maintenance requirements, scalability, recovery to the previous point in time and ease of testing. Choices are many when it comes to disaster recovery solution, thus do you thorough research and choose wisely.
5. Launch Communication Channels
No one knows when disaster can knock your door, so being an organization you must keep a list of teams (along with their roles and contact information) for disaster recovery. Try to establish a comprehensive chain of command which includes accountable individuals from each of the engineering teams (for e.g. database, systems, network, storage) and relevant executive leadership. Also, set up dedicated communication channels and hubs, or an online information-sharing tool to use for instant messaging.
6. Outline an Incident Response Procedure
If you have a disaster recovery plan, then an “incident response procedure” is a must. Herein the companies will define in detail which events have to be declared as a disaster. For instance, if your system goes down, will you consider that as a disaster? Also, the plan should also indicate how to verify the disaster and how it will be reported— by an automatic monitoring system, raised by calls from site reliability engineering (SRE) teams, or reported by customers?
In order to verify that a disaster is really happening, you need to check the status of critical network devices, application logs, server hardware or any other critical components in your production system, that you monitor proactively. If something is odd or not working, then for sure you have a disaster on your hands.
7. Outline an Action Response Procedure
Once disaster strikes, a disaster recovery environment needs to be activated as soon as possible. An action response procedure will outline how to failover to the disaster recovery site with all the required steps. No matter whether your recovery process is using DRaaS or a disaster recovery tool to launch your disaster site automatically, even then you need to prepare the action response procedure in writing to ensure how the necessary services will be started, verified, and controlled.
Additionally, spinning up production services in another location is not sufficient, ensuring that all the required data is in place, and all the required business applications are functioning properly, is also equally critical.
8. Get Ready for Failback to Primary Infrastructure
Once the disaster is over, a lot of effort is required to implement the moving of data and business services back to the primary location. Plan for a potential partial disruption of your business during the revert process. Fortunately, there exists disaster recovery solutions that provide unified failback to the primary location, activated automatically or manually once you complete the verification of the primary IT location.
9. Do the Extensive Tests
Testing your disaster recovery plan is mandatory but usually neglected. Failover tests are usually complex and lead to data loss and disruption of product services, thus most companies don’t test their disaster recovery plan on a regular basis.
In order to understand how well your disaster recovery plan will work, you must schedule regular failover tests. Ignoring the disaster recovery plan tests can put your entire business at risk during a disaster strike, with ending up either unable to recover in time or no recovery at all. Performance tests also help you to assess whether or not your secondary location is sufficient to withstand the business load
10. Keep your Disaster Recovery Plan Updated
Last but not least, as disaster recovery plan testing is mandatory, so is keeping all the disaster recovery documents updated. At the end of each test review what happened, how your teams handle the test and document your findings.
If you have a good disaster recovery plan handy then it will help your company to recover all the lost data and hasten your organization’s return to normal business operations. In addition to that, it will also ensure that disaster will not trigger adverse financial consequences and major business disruptions.
This post was written by Renju Thampy