Email is becoming ubiquitous and has become the standard tool for communication in many enterprises, big and small. Microsoft is the dominant player in the messaging platform market through its Exchange Server. Enterprises are clearly choosing the reliability, scalability, and performance of Exchange, combined with the feature-rich Microsoft Outlook and Outlook Web Access clients and built-in collaboration services for workflow and other applications.
Email has become a mission-critical application for most businesses today and it has long been a challenge to backup and restore email information. If a crash occurs and if the data is not restored, it can have devastating consequences for a business. So it is imperative for companies to effectively backup and recover data and protect them from huge losses in productivity and downtime.
High Availability Solutions for Exchange Servers
Failover Clustering
Microsoft Clustering enables users to prevent hardware failures by stringing redundant hardware, called nodes, together through a central cluster manager that coordinates load balancing and data activity. Typically, nodes share common storage space and have the capability of picking up load off of a node that goes down due to hardware or software malfunction. There are two types of cluster environments—active/active and active/passive. In the former, every node in the environment is live and capable of processing requests. When one active node goes down, the others simply process more requests as the load is evenly dispersed across the remaining nodes. In the latter, there is a single active node that processes all incoming requests. Upon hardware or software failure in the active node, the passive node is immediately and automatically brought up by the cluster manager to take over the normal function of processing data requests. In this way, hardware exposure is mitigated through physical hardware redundancy.
Microsoft Exchange Server supports both active-active and active-passive cluster environments. Exchange Server Clustering provides high availability by protecting against a node failure. However, it does not prevent against storage failures. Given the size of typical cluster environments, multiple hard disks are used to build large storage arrays. In Network and System Administration, when large numbers of any one device are used, failure is expected. When a hard disk fails, application disruption is unavoidable, as all the nodes in the cluster could be using that one particular disk as shared storage which contains all files, including Exchange Server database files. As protection against this particular failure, RAID configurations are common. However, from a performance standpoint, this significantly slows down I/O in the subsystems due to writing the data to multiple disks at the same time. Administrators have to balance such performance degradation and understand that this particular implementation has limitations. Again, the RAID option is to protect against any hard disk failure but it cannot prevent site disasters.
In direct contrast to this storage dependency, using other replication approaches prevent against hardware, software and storage failures. Failover servers are normally installed on unique, usually geographically independent, Exchange Servers which serve as a barrier to failures of any type.
Exchange Server Clustering environments are more cost-intensive compared to the Standby option. The primary reason for this is the high hardware and software requirements. Clustering requires Windows NT Enterprise Edition, Windows 2000 Advanced Server or Windows 2003 Enterprise Edition and Exchange Server Enterprise Edition. Additionally, it only supports hardware listed on the Microsoft Hardware Compatibility list. On the other hand, a Standby or Failover server does not have any special hardware requirements and is simply a software solution to meet disaster recovery needs. As an additional cost, LAN connectivity is required between the Exchange Server cluster nodes to send and receive what is called a heartbeat signal, among other communications. This signal is used by each node to determine if other nodes are still available. In case any node is not available the remaining nodes take over. With Standby, LAN or WAN network connectivity will work to replicate Exchange Server mailboxes. The speed of this process is directly related to the size of the mailboxes and network bandwidth.
File or Block Level Replication
Different kinds of replication techniques can be used to replicate data between two servers both locally and remotely. In block level, replication is performed by the storage controllers or by mirroring the software. In file-system level (replication of file system changes), the host software performs the replication. In both block and file level replication, it does not matter what type of applications are getting replicated. They are basically application agnostic, but some vendors do offer solutions with some kind of application specificity. But some of the disadvantages are:
Typically, identical hardware/software in both production and replicated servers are needed.
Possibility of virus/corruption getting propagated from production server to replicated server.
Exchange 2007 Built-in High Availability Features
Exchange Server 2007 includes four features that provide high availability for Mailbox servers: Local Continuous Replication (LCR), Cluster Continuous Replication (CCR), Single Copy Clusters (SCC) and Standby Continuous Replication (SCR).
- Local Continuous Replication (LCR): LCR is a single-server solution that uses built-in asynchronous log shipping technology to create and maintain a copy, or replica, of a storage group on a second set of disks that are connected to the same server as the production storage group. LCR provides log shipping, log replay, and a quick manual switch to a secondary copy of the data.
- Cluster Continuous Replication (CCR): CCR is a clustered solution that uses built-in asynchronous log shipping technology to create and maintain a storage group replica on a second server. CCR is designed to be either a one or two datacenter solution, providing both high availability and site resilience.
- Single Copy Clusters (SCC): SCC is a clustered solution that uses a single copy of a storage group on storage that is shared between the nodes in the cluster. SCC is very similar to clustering in previous versions of Exchange Server, with some significant changes and improvements.
- Standby Continuous Replication (SCR): SCR is designed for scenarios that use or enable the use of standby recovery servers. SCR enables a separation of high availability and site resilience. SCR can be combined with CCR to replicate storage groups locally (using CCR for high availability) and remotely in a secondary site (using SCR for site resilience).
These high availability features provide good functionality but one has to be an experienced user of Exchange server to implement them. Also, here are some of the constraints one will face when implementing the built-in high availability of features of Exchange 2007.
- Exchange Server 2007 runs on a 64-bit machine and hence costs more.
- For best performance, it is recommended that Active Directory Domain Controllers also run on a 64-bit machine, but it is not mandatory.
- No support for Exchange 2000 and Exchange 2003.
- The replicated server is in a passive mode and cannot be accessed for reporting, monitoring and archival purposes.
- It cannot create replication for all storage groups at one time.
- It is a must to have only one mailbox store in a Storage Group, otherwise Exchange 2007 Replication will not work.
Mailbox Replication Approach
In this approach, the replication is done at a mailbox level and it is very application specific. One can pick and choose the mailboxes that need to be replicated. One can set up a granular plan for key executives, sales and IT people, in which the replication occurs more frequently to achieve the required Recovery Point Objective (RPO) and Recovery Time Objective (RTO). For everyone else in the company, another plan can be set up where the replication intervals are not that frequent.
Another advantage of this approach is that the replicated or failover server is in an Active mode. The failover server can be accessed for reporting and monitoring purposes. With other replication approaches, the failover server is in a Passive mode and cannot be used for maintenance, monitoring or reporting purposes.
Backup and Replication
Some solutions offer both backup and replication as part of a single solution. In this case, the backup is integrated with replication and the users get a two-in-one solution. Considered two-tier architecture, these solutions consist of an application and agent environment. The application server also hosts the network share that stores all the backup files. The files are stored on this network share and not on any particular target server so as to prevent loss of backup files. If the target server goes down, users would like to continue to access their backup files in order to rebuild the target server with as little downtime as possible.
Figure 1
The mailboxes will be backed to the backup server and then replicated to the remote failover server. The full backup and restore is done first and then only the changes will be applied through incremental. For restoring emails and mailboxes, the local backup data can be used and for disaster recovery purposes, the remote failover server can be utilized.
Failover/Failback
When a disaster strikes the primary site, then all the users will be failed over to the remote site. Once the primary is rebuilt, one has to go through the failback process. The only way to make sure that your disaster recovery solution works is to test it periodically. Unfortunately, to do that one has to failover the entire Exchange server. Exchange Administrators will be leery about doing this for fear of crashing the production Exchange server. With the mailbox replication approach, one can create a test mailbox and use it for failover/failback testing periodically.
Conclusion
Companies are impacted adversely with significant loss of productivity and revenue when an Exchange server goes down. With increasing dependence of business on Exchange server, customers are demanding instant failover to a local or remote server. This concept may mean survival of business in case of a major destruction. High availability and disaster recovery of Exchange servers should be taken seriously and companies should implement the proper solution to protect them. One has to choose the appropriate solution based on their needs to protect the Exchange servers.