Thursday, December 25, 2008

Protecting Microsoft Exchange in Physical & Virtual Environments

Introduction

For many companies, email has become a more important communication tool than the telephone. Internal employee communication, vendor and partner communication, email integration with business applications, collaboration using shared documents and schedules, and the ability to capture and archive key business interactions all contribute to the increasing reliance on email.

Businesses of all sizes, from multinational enterprises to small and midsize businesses, are using the messaging and collaboration features of Microsoft Exchange to run business functions that if lost, for even a short amount of time, can result in severe business disruption. No wonder Exchange has become a critical application for so many businesses. When these businesses look at high availability solutions to protect key business applications, Exchange is often the first application targeted for protection.

Improving the availability of Exchange involves reducing or eliminating the many potential causes of downtime. Planned downtime is less disruptive since it can be scheduled for nights or weekends - when user activity is much lower. Unplanned downtime, on the other hand, tends to occur at the worst possible times and can impact the business severely. Unplanned downtime can have many causes including hardware failures, software failures, operator errors, data loss or corruption, and site outages. To successfully protect Exchange you need to ensure that no single point of failure can render Exchange servers, storage or network unavailable. This article explains how to identify your failure risk points and highlights industry best practices to reduce or eliminate them, depending on your organization’s Exchange availability needs, resources and budget.
Exchange Availability Options

Most availability products for Exchange fall into one of three categories: traditional failover clusters, virtualization clusters and data replication. Some solutions combine elements of both clustering and data replication; however, there is no single solution that can address all possible causes of downtime. Traditional and virtualization clusters both rely on shared storage and the ability to run applications on an alternate server if the primary server fails or requires maintenance. Data replication software maintain a second copy of the application data, at either a local or remote site, and support either manual or automated failover to handle planned or unplanned server failures.

All of these products rely on redundant servers to provide availability. Applications can be moved to an alternate server if a primary server fails or requires maintenance. It is also possible to add redundant components within a server to reduce the chances of server failure.

Get Rid Of Failover – Get Rid Of Downtime

Most availability products rely on a recovery process called “failover” that begins after a failure occurs. A failover moves application processing to an alternate host after an unplanned failure occurs or by operator command to accommodate planned maintenance activity. Failovers are effective in bringing applications back online reasonably quickly but they do result in application downtime, loss of in-process transactions and in-memory application data, and expose the possibility of data corruption. Even a routine failover will result in minutes or tens of minutes of downtime including the time required for application restart and data recovery resulting from an unplanned failure. In the worst case, software bugs or errors in scripts or operational procedures can result in failovers that do not work properly; with the result that downtime can extend to hours or even days. Reducing the number of failovers, shortening the duration of failovers, and ensuring that the failover process is completely reliable, all contribute to reducing Exchange downtime.

Local server redundancy and basic failover address the most common failures that cause unplanned Exchange downtime. However, data loss or corruption, and site disruptions, although less common, can cause much longer outages and require additional solution elements to properly address.
Evaluate Unplanned Downtime Causes

Unplanned downtime can be caused by a number of different events:

-Catastrophic server failures caused by memory, processor or motherboard failures
-Server component failures including power supplies, fans, internal disks, disk controllers, host bus adapters and network adapters
-Software failures of the operating system, middleware or application
-Site problems such as power failures, network disruptions, fire, flooding or natural disasters

Each category of unplanned downtime is addressed in more detail below.
How to Avoid Server Hardware Failures

Server core components include power supplies, fans, memory, CPUs and main logic boards. Purchasing robust, name brand servers, performing recommended preventative maintenance, and monitoring server errors for signs of future problems can all help reduce the chances of failover due to catastrophic server failure.

Failovers caused by server component failures can be significantly reduced by adding redundancy at the component level. Robust servers are available with redundant power and cooling. ECC memory, with the ability to correct single-bit memory errors, has been a standard feature of most servers for several years. Newer memory technologies including advanced ECC, online spare memory, and mirrored memory provide additional protection but are only available on higher-cost servers. Online spare and mirrored memory can increase memory costs significantly and may not be cost effective for many Exchange environments.

Internal disks, disk controllers, host bus adapters and network adapters can all be duplicated. However, adding component redundancy to every server can be both expensive and complex.

Reduce Storage Hardware Failures

Storage protection relies on device redundancy combined with RAID storage to protect data access and data integrity from hardware failures. There are distinct issues for both local disk storage and for shared network storage.

Critical Moves To Protect Your Local Storage

Local storage is only used for static and temporary system data in a clustering solution. Data replication solutions maintain a copy of all local data on a second server. However, failure of unprotected local storage will result in an unplanned server failure, introducing the downtime and risks involved in a failover to an alternate server. For local storage, it is quite easy to add extra disks configured with RAID 1 protection. It is critical that a second disk controller is also used and that disks within each RAID 1 set are connected to separate controllers. Using other RAID levels, such as RAID 5, is not recommended for local disk storage the write cache is lost.
Secure Your Shared Storage

Shared storage depends on redundancy within the storage array itself. Fortunately, storage arrays from many storage vendors are available with full redundancy that includes disks, storage controllers, caches, network controllers, power and cooling. Redundant, synchronized write caches available in many storage arrays allow the use of performance-boosting write caching without the data corruption risks associated with single write caches. It is critical, however, that only fully-redundant storage arrays are used; lower-cost, non-redundant storage array options should be avoided.

Access to shared storage relies on either a fibre channel or Ethernet storage network. To assure uninterrupted access to shared storage, these networks must be designed to eliminate all single points of failure. This requires redundancy of network paths, network switches and network connections to each storage array. Multiple host bus adapters (HBAs) within each server can protect servers from HBA or path failures. Multipath IO software, required for supporting redundant HBAs, is available in many standard operating systems (including MPIO for Windows) and is also provided by many storage vendors; examples include EMC PowerPath, HP Secure Path and Hitachi Dynamic Link Manager. But these competing solutions are not universally supported by all storage network and storage array vendors, often making it difficult to choose the correct multipath software for a particular environment. This problem becomes worse if the storage environment includes network elements and storage arrays from more than a single vendor. Multipath IO software can be difficult to configure and may not be compatible with all storage network or array elements.
Say Goodbye to Networking Failures

The network infrastructure itself must be fault-tolerant, consisting of redundant network paths, switches, routers and other network elements. Server connections can also be duplicated to eliminate failovers caused by the failure of a single server component. Take care to ensure that the physical network hardware does not share common components. For example, dual-ported network cards share common hardware logic and a single card failure can disable both ports. Full redundancy requires either two separate adapters or the combination of a built-in network port along with a separate network adapter.

Software to control failover and load sharing across multiple adapters falls into the category or NIC teaming and includes many different options. Options include fault tolerance (active/passive operation with failover), load balancing (multiple transmit with single receive) and link aggregation (simultaneous transmit and receive across multiple adapters). Load balancing and link aggregation also include failover.

Choosing among these configuration options can be difficult and must be considered along with the overall network capabilities and design goals. For example, link aggregation requires support in the network switches and includes several different protocol options including Gigabit EtherChannel and IEEE 802.3ad. Link aggregation also requires that all connections be made to the same switch, opening a vulnerability to a switch failure.
Minimize Software Failures

Software failures can occur at the operating system level or at the Exchange application level. In virtualization environments, the hypervisor itself or virtual machines can fail. In addition to hard failures, performance problems or functional problems can seriously impact Exchange users, even while all of the software components continue to operate. Beyond proper software installation and configuration along with the timely installation of hot fixes, the best way to improve software reliability is the use of effective monitoring tools. Fortunately, there is a wide choice of monitoring and management tools for Exchange available from Microsoft as well as from third parties.

Reduce Operator Errors

Operator errors are a major cause of downtime. Proven, well-documented procedures and properly skilled and trained IT staff will greatly reduce the chance for operator errors. But some availability solutions can actually increase the chance of operator errors by requiring specialized staff skills and training, by introducing the need for complex failover script development and maintenance, or by requiring the precise coordination of configuration changes across multiple servers.
Secure Yourself from Site-Wide Outages

Site failures can range from an air conditioning failure or leaking roof that affect a single building, a power failure that affects a limited local area, or a major hurricane that affects a large geographic area. Site disruptions can last anywhere from a few hours to days or even weeks. While site failures are less common than hardware or software failures, they can be far more disruptive.

A disaster recovery solution based on data replication is a common to protect Exchange from a site failure while minimizing downtime associated with recovery. A data replication solution that moves data changes in real time and optimizes wide area network bandwidth will result in a low risk of data loss in the event of a site failure. Solutions based on virtualization can reduce hardware requirements at the backup site and simplify ongoing configuration management and testing.

For sites located close enough to each other to support a high-speed, low-latency network connection, solutions offering better availability with no data loss are another option.
Failover Reliability

Investments in redundant hardware and availability software are wasted if the failover process is unreliable. It is obviously important to select a robust availability solution that handles failovers reliably and to ensure that your IT staff is properly skilled and trained. Solutions need to be properly installed, configured, maintained and tested.

Some solution features that contribute to failover reliability include the following:

-Simple to install, configure and maintain, placing a smaller burden on IT staff time and specialized knowledge while reducing the chance of errors
-Avoidance of scripting or failover policy choices that can introduce failover errors
-Detection of actual hardware and software errors rather than timeout-based error detection
-Guaranteed resource reservation versus best-effort algorithms that risk resource over commitment
Protect Against Data Loss and Corruption

There are problems of data loss and corruption that require solutions beyond hardware redundancy and failover. Errors in application logic or mistakes by users or IT staff can result in accidentally deleted files or records, incorrect data changes and other data loss or integrity problems. Certain types of hardware or software failures can lead to data corruption. Site problems or natural disasters can result in loss of access to data or the complete loss of data. Beyond the need to protect current data, both business and regulatory requirements add the need to archive and retrieve historical data, often spanning several years and multiple types of data. Full protection against data loss and corruption requires a comprehensive backup and recovery strategy along with a disaster recovery plan.

In the past, backup and recovery strategies have been based on writing data to tape media that can be stored off-site. However, this approach has several drawbacks:

-Backup operations require storage and processing resources that can interfere with production operation and may require some applications to be stopped during the backup window
-Backup intervals typically range from a few hours to a full day, with the risk of losing several hours of data updates that occur between backups
-Using tape backup for disaster recovery results in recovery times measured in days, an unacceptable level of downtime for many organizations

Data replication is a better solution for both data protection and disaster recovery. Data replication solutions capture data changes from the primary production system and send them, in real time, to a backup system at a remote disaster site, at the local site, or both. There is still the chance that a system failure can occur before data changes have been replicated, but the exposure is in seconds or minutes rather than hours or days. Data replication can be combined with error detection and failover tools to help get a disaster recovery site up and running in minutes or hours, rather than days. Local data copies can be used to reduce tape backup requirements and to separate archival tape backup from production system operation to eliminate resource contention and remove backup window restrictions.

Consider Issues That Cause Planned Downtime

Hardware and software reconfiguration, hardware upgrades, software hot fixes and service packs, and new software releases can all require planned downtime. Planned downtime can be scheduled for nights and weekends, when system activity is lower, but there are still issues to consider. IT staff morale can suffer if off-hour activity is too frequent. Companies may need to pay overtime costs for this work. And application downtime, even on nights and weekends, can still be a problem for many companies that use their systems on a 24/7 basis.

Using redundant servers in an availability solution can allow reconfiguration and upgrades to be applied to one server while Exchange continues to run on a different server. After the reconfiguration or upgrade is completed, Exchange can be moved to the upgraded server with minimal downtime. Most of the work can be done during normal hours. Solutions based on virtualization, which can move applications from one server to another with no downtime, can reduce planned downtime even further. Be aware that changes to application data structures and formats can preclude this type of upgrade.
Added Benefits of Virtualization

The latest server virtualization technologies, while not required for protecting Exchange, do offer some unique benefits that can make Exchange protection both easier and more effective.

Virtualization makes it very easy to set up evaluation, test and development environments without the need for additional, dedicated hardware. Many companies cannot afford the additional hardware required for testing Exchange in a traditional, physical environment but effective testing is one of the keys to avoiding problems when making configuration changes, installing hot fixes, or moving to a new update release.

Virtualization allows resources to be adjusted dynamically to accommodate growth or peak loads. The alternative is to buy enough extra capacity upfront to handle expected growth, but this can result in expensive excess capacity. On the other hand, if the configuration was sized only for the short-term load requirements, growth can lead to poor performance and ultimately to the disruption associated with upgrading or replacing production hardware.

Managing Exchange Certificates

Introduction

Certificates can be used to encrypt the communication flow between two endpoints (both clients and servers). Certificates can also be used by these endpoints to authenticate themselves from each other. Exchange 2007 uses X.509 certificates for authentication and for encryption. X.509 certificates follow a standard format as published by the Telecommunication Standardization Sector (ITU-T).

An X.509 certificate is issued by a Certificate Authority (CA) that will bind the public key to a designated Distinguished Name, formatted according to the X.500 tradition, or to a so-called Subject Alternative Name or any of the Subject Alternative Names.
There are several components in Exchange 2007 that rely on certificates for encryption, authentication or both. In this article I will provide you with an overview of the different Exchange components that use certificates. I will then go deeper into the features of the by-default generated self-signed certificate. In part 2 of this article I will cover the naming requirements of a certificate you need to keep in mind when getting your certificates. To end, in part 3 of this article I will take a closer look at the different Exchange Management Shell cmdlets that are available to create, manage, and remove Exchange certificates.

Certificate Usage by Exchange Server 2007 Components

As already stated, several Exchange Server 2007 components rely on X.509 certificates for encryption, authentication or both. You will notice that when you install the Exchange 2007 Hub Transport server role, Client Access server role, Unified Messaging server role, and Edge Transport server role, Exchange will create by default a self-signed certificate to make sure its required components can use that certificate to function as required.

Figure 1 below shows you the self-signed certificate that is created by Exchange during the installation of the Exchange 2007 Client Access, Hub, and Unified Messaging server role. This certificate will be used by the following services: IIS, SMTP, POP, IMAP, and UM.

Figure 1: Self Signed Certificate created by default when installing the Exchange 2007 HUB, CAS, UM server role

Hub/Edge Transport server role and certificates

Transport Layer Security between Active Directory sites

The Exchange 2007 Hub Transport server role uses a certificate to encrypt all SMTP traffic between Active Directory sites. It is not possible to configure Exchange to allow unencrypted SMTP traffic between Hub Transport servers, located in different sites.

In order to see which certificate is used between two Hub Transport servers located in different Active Directory sites, you can enable SMTP protocol logging on the intra-organization Send connector on every Hub Transport server, as you can see in figure 2 below, by using the Exchange Management Shell cmdlet Set-TransportServer.
Figure 2: Setting IntraOrgConnectorProtocolLogging to verbose

By setting the so-called IntraOrgConnectorProtocolLoggingLevel to verbose, protocol logging will be added to the Send connector protocol log. After sending a mail from a mailbox homed in Site B to a mailbox located on an Exchange 2007 Mailbox server in Site A, looking at the Send protocol log reveals that the Exchange Hub Transport server in Site B (Ex2007SE) uses the certificate offered by the Exchange Hub Transport server in the destination Active Directory site (Ex2007EE) to start Transport Layer Security, as can be seen in Figure 3.


Figure 3: Send Protocol Log between Active Directory Sites

A quick look at the certificate on the Hub Transport server available for TLS, shows that it is a self-signed certificate used (Figure 4).


Figure 4: Self Signed Certificate

EdgeSync

Once EdgeSync is configured between your internal Hub Transport servers and the Edge Transport server(s), both servers will use a certificate to encrypt their communication. In addition both certificates will be used as a means to provide direct trust. Direct trust is a method of authentication where a certificate can be used for authentication when the provided certificate is present in Active Directory (for the Hub Transport server role) or ADAM/LDS (for the Edge Transport server role). When setting up EdgeSync, the requested certificates are published in the correct location.

Opportunistic Transport Layer Security

Whenever a SMTP server opens a connection to the Exchange 2007 Hub/Edge Transport server role, Exchange will allow for opportunistic TLS, by offering its certificate.
Domain Security

Certificates can also be used by the Hub/Edge Transport server to configure Domain Security with partner organizations, both for encryption and authentication.

Client Access Server role and certificates

Client Access

Certificates are used by the Client Access server role to allow the communication flow to be encrypted between the Client Access server and its different clients. By default SSL is required for:

-Outlook Web Access
-Outlook Anywhere
-Exchange ActiveSync
-POP3
-IMAP4
-Exchange Web Services as Autodiscover, EWS, and Unified Messaging

Figure 5: Require SSL

The only virtual directory for which the use of a certificate is not required by default is the one that makes the Offline Address Book available for download by Microsoft Office Outlook 2007 clients and later.

Figure 6: OAB Virtual Directory does not require SSL by default

Certificate Based Authentication

It is possible to configure certificate based authentication, thereby allowing clients to authenticate themselves against the Client Access server by using their personal certificate. For more information.

Unified Messaging Server Role and Certificates

Certificates are used by the Unified Messaging Server role to encrypt the communication when sending a recorded Voice Mail message to the Exchange Hub Transport Server role. Certificates can also be used to encrypt the SIP and/or RTP traffic to the UM IP Gateway, and have to be used when you decide to deploy Office Communications Server in your environment, since Office Communications Server only communicates with other server roles through encryption.

What is all this about the Self-Signed Certificate?

When you deploy any Exchange 2007 Server role, except for the Mailbox Server role, Exchange will generate a self-signed certificate, and allow Exchange to use this certificate when required for the services IIS, SMTP, POP3, IMAP4, and UM.

Characteristics of this Self-Signed Exchange Certificate

Let us have a look at some of the features of this by default generated Self-Signed certificate.

Self-Signed certificates are only valid for one year

Self-Signed certificates are valid for one year, as can be seen in Figure 7, and will need to be renewed after a year.

Figure 7: Self-Signed Certificate only valid for one year

To renew a Self-Signed certificate, you can use the Exchange Management Shell cmdlet New-ExchangeCertificate. If you first grab the existing certificate by running Get-ExchangeCertificate, you can pipe the object to the cmdlet New-ExchangeCertificate, which will generate a new Self-Signed Certificate with the same settings, and enable it for the same services by default.

In Figure 8 you can see how the existing Self-Signed Certificate is renewed.

Figure 8: Renew an existing Self-Signed Certificate

The Exchange 2007 Client Access server only allows one certificate to be enabled for usage with IIS, but you can have multiple certificates enabled for POP, IMAP, UM, and SMTP. When multiple certificates are available, Exchange will select a certificate based on different criteria. I will come back this certificate selection process in part 2 of this article.

Self-Signed Certificate has by default one Common Name and two Subject Alternative Names

The Self-Signed certificate that is created when deploying Exchange 2007 will have its common name set to the Host name of the Exchange server, and have two Subject Alternative Names set to its Host name and its Fully Qualified Domain Name.

Figure 9: Self-Signed Certificate and its Subject and CertificateDomains

It is possible however to generate a Self-Signed Certificate with another Subject and Subject Alternative Names to make sure it can be used in your Exchange organization.

Using the Exchange Management Shell cmdlet New-ExchangeCertificate, you can create for example a certificate with Common Name webmail.proexchange.global, and then specify Subject Alternative Names like the Exchange server its Host and Fully Qualified Domain Name, as seen in Figure 10.

Do not forget to add the boolean parameter PrivateKeyExportable and set it to True, if you want to be able to export this Self-Signed certificate to enable your users to trust it (full details on this in part 2 of the article).

Figure 10: Generating a new Self-Signed Certificate with customized Subject Alternative Names

In part 2 of this article, I will come back to the required names of a certificate. In part 3 I will explain in more detail the used cmdlets.

Self-Signed Certificate are only trusted by its issuer

It is very important to know that the Self-Signed certificate is only trusted by the issuer of the certificate itself, which could break Exchange functionality if not configured correctly. Let us see what you need to consider if you decide to use the Self-Signed certificate:

Outlook Anywhere and Exchange ActiveSync do not support the use of a self-signed certificate

The Autodiscover web service will not check if the issuer of the certificate is trusted when launching Microsoft Office Outlook 2007 from a domain-joined client pc, but will complain about the certificate if you are using Microsoft Office Outlook 2007 from a non-domain-joined client pc, as shown in Figure 11.

Figure 11: Self-Signed certificate not trusted

When Microsoft Office Outlook 2007 clients (domain-joined or not) use the Exchange Web Services provided by the Microsoft Exchange Client Access server, they will be prompted by Outlook that the certificate is not issued by a company they have chosen not to trust. Figure 12 shows the Security Alert shown when someone requests Free and Busy information.

Figure 12: Self-Signed Certificate not trusted

Microsoft does support the use of Self-Signed certificates, but only for internal scenarios, like:

-To encrypt SMTP sessions between Hub Transport servers in different sites;

- To encrypt SMTP sessions between Hub Transport servers and Edge Transport servers;

- To encrypt the synchronization of configuration and recipient information by configuring EdgeSync between internal Hub Transport servers and Edge Transport server(s);

- To encrypt SMTP sessions between Unified Messaging servers and Hub Transport servers;

- To encrypt SIP and RTP sessions between Unified Messaging servers and Office Communications servers (this does require you to make sure that the Office Communication Mediation server trusts your Exchange server as the issuer of that Self-Signed certificate);

- To encrypt internal client access to Exchange (POP,IMAP,Outlook Web Access).
If you do not want Exchange to generate a self-signed certificate during installation, you can specify the /NoSelfSignedCertificates parameter next to Setup in the command prompt. Be careful: this parameter can only be used when installing the Client Access server role or the Unified Messaging server role. If your server does not have a valid certificate available to encrypt communication between clients and the Client Access server or the Unified Messaging server, communication will be unencrypted, and therefore, insecure.

Summary

In the first part of this 3-part article on certificates and Exchange, you have seen which Exchange 2007 components use certificates, and what characteristics the self-signed certificate carries. In part 2 of this article I will show how you can trust the self-signed certificate and I will cover the requirements of a certificate you need to keep in mind when getting your certificates. To end, in part 3 of this article I will give you a close look at the different Exchange Management Shell cmdlets that are available to create, manage, and remove Exchange certificates.