Software Newsletter      mailto:csamost@upsideresearch.com   Software Journal
   
Software Journal
  Search  
   
   
 
The Software 500
Application Development
Application Focus
Business Intelligence
Customer Relationship
Management
IT Infrastructure
Security
The Business of IT
TECH CENTER
   
  Software Journal  
 

 

Our Partners

Sign Up for Digital Software Magazine
 
eInquiry System
 
 
|   Login to SW500 Survey    |   SoftwareMag Login   |    Register   |
Security
Commentary (August, 2008)

Protecting Apps and Data
with a Disaster Recovery Plan

by Peter Gregory

Disaster recovery is often overlooked until it’s too late; take steps now to protect your organization and its applications from disasters; a resilient application architecture might make disaster recovery unnecessary
 

Organizations that develop software tend to spend most of their time thinking about their current programs, short- and long-term changes, application architecture, and — occasionally — support. They rarely think about application availability and resilience, or disaster scenarios — those are things that happen to other organizations that made poor location decisions or suffered from bad luck. While there are locations that are more prone to disasters, bad things can happen to good organizations, no matter where they are located.

Most small and medium-sized organizations are wholly unprepared for even the most basic response should a disaster occur. Many tend to think that disaster recovery planning is only for large organizations, that it is too costly to undertake, or that a disaster cannot happen to them.

Software developers can play a role in improving the survivability of the applications they build. By being aware of the types of disasters that can occur and ways in which applications can be made more resilient and recoverable, developers can help influence how current and future applications are designed and implemented.

How Disasters Affect Applications

Disasters are events that threaten to disable IT applications in any of several ways, including:

  • Directly damaging information processing equipment and facilities.
  • Disrupting public utilities such as electric power, natural gas, and water. Although industrial-level processing centers usually have their own electric generators, many businesses host their own applications, particularly those used internally.
  • Disrupting telecommunications infrastructure. When telecom facilities are damaged in a disaster, it can be difficult or impossible to connect remotely to applications. Support and operations personnel may also be unable to reach vital apps, and customers will not be able to contact the organization.
  • Disrupting transportation infrastructure. Damage to transportation facilities will cause delays in the shipments and will also hamper employees’ and customers’ abilities to report to work and patronize businesses.

Disasters are categorized by type:

  • Natural. These are the typical “acts of God,” including earthquakes, hurricanes, landslides, tsunamis, volcanoes, fires, floods, and many other events that damage buildings, roads, utilities, and communications facilities.
  • Manmade. These disasters are caused by humans through action, inaction, and errors. Examples include equipment failures, administrative errors, software bugs, utility outages, strikes, riots, chemical spills, transportation accidents, building collapses, sabotage, terrorism, and war.

The bottom line in these scenarios is that software applications are either down or unreachable. But what can software professionals do about it?

How to Protect Software and Data

The purpose of a disaster recovery plan is to protect and improve the availability of critical applications. Software and data need to be protected against the types of losses that disasters can inflict.

Management will need to develop a disaster recovery plan that starts with a Business Impact Analysis, which is a process of identifying the most critical business processes and their supporting IT systems, followed by determining targets called Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). These two objectives specify the maximum data loss and the maximum period of time that an application will be “off the air” before it is recovered in the same or a different processing center.

RTO and RPO need to be carefully chosen: values too small will increase the cost of application resiliency, while values too large mean it will take too long to recover applications. The process of developing these targets is discussed in detail in chapter three of my book, IT Disaster Recovery Planning for Dummies.

Some of the technology-related steps that need to be taken to protect software and data include:

  • Standardize server OS images. While this doesn’t appear to be directly related to application software and data, standardizing server operating system (OS) images helps system administrators to more quickly build replacement servers so that they can help get applications running again after a disaster.
  • Server virtualization. Using a server virtualization product such as VMware facilitates the standardization of server OS configuration, but it also enables more flexibility with regard to how an environment is recovered after a disaster.
  • Replication. If your organization has more than one site for processing information, then you will want to consider employing real-time (or near real-time) data replication from your primary site to your secondary site.
  • Backups. Doing backups means taking copies of software and data and writing it to removable, portable media that can easily be transported to another location if necessary. Backup tape has been the medium of choice for decades, although disk-based backup is quickly being adopted because of dropping costs for disk-based media and the speed at which backups and recovery can be performed.
  • Remote storage of backup media. Backup media that is stored nearby runs the risk of being damaged or destroyed in a disaster. For this reason, it is important to store backup media in a safe location that is far enough away from the processing center so that it is not harmed in a common disaster, but close enough that it can be quickly transported to the processing center when needed for a recovery.
  • E-Vaulting. This is an emerging technique for backing up data. Important software and data are transmitted over data networks to a remote site to a storage system. Organizations can purchase e-vaulting software and perform this on their own infrastructure, or use any of several online services that perform this for corporate customers.

The Role of Application Architecture

Although backups, replication, and e-vaulting play their part, an application’s architecture plays a significant role in an organization’s ability to easily recover an application — but more than that, a resilient application architecture might make recovery unnecessary. For example, server clustering and data replication can be used to build alternate instances of applications that can be ready to assume production duties within hours, or even instantly if that’s what a particular application requires.

One important aspect of an application is session management, particularly if your recovery time objective is measured in minutes and you need to be able to preserve each active user’s session even if your production servers are disabled or unreachable in a disaster. While it can be tempting to use a Web server’s user session management features, unless real-time data about user sessions can be replicated to other servers in a cluster, you might want to manage user session state right in the application, and store session data in the replicated database.

Often, attention to resilience and survivability is focused on a production application environment, while little consideration is given to an application’s source code.

Just like application data, source code must be backed up, and backup media stored offsite. Depending upon how software in the production environment is implemented, source code may or may not be required in a production environment recovery situation; this needs to be taken into account when planners develop ground-up recovery procedures.

Disaster recovery planning is a hot topic in many organizations and industries, and sometimes it’s even mandated by standards and regulations. There are some good sources of information where you can learn more about disaster recovery planning, including:

  • IT Disaster Recovery Planning for Dummies. Published in 2008, this is an easy-to-read book that will take you through the entire disaster recovery planning lifecycle.
  • DRI International. Provides education programs and certifications (www.drii.org)
  • Disaster Recovery Journal. Rich and extensive website with articles on disaster recovery planning. Also an in-print magazine (www.drj.com)
  • Business Continuity Management Institute. This organization specializes in education and certifications on disaster recovery planning and business continuity planning (www.bcm-institute.org)

Peter Gregory, CISA, CISSP, is a career technologist and a security and risk manager at a financial management company in Redmond, Wash. He is the author of 20 books on security and technology, including IT Disaster Recovery Planning for Dummies and Solaris Security.

 
 
 
Related Links
  Back to Home Page  
Advertisement
Sign Up for Digital Software Magazine

     
Home |  About Us |  Software 500 |  Editor's Desk |  Subscribe |  Advertise |  Contact Us | 

Copyright © 1999-2010 Software Magazine and King Content Co.
Site Design by Enervision Media
Site Development/Administration by Kunal Panchal