Tune Up Exchange Server with the Best Practices Analyzer
Jon Avner and Paul Bowden
At a Glance:
- Introducing Exchange Server Best Practices Analyzer
- Keeping ExBPA updated with the latest issues
- Commonly revealed configuration problems
- Additional tools for Exchange troubleshooting
Exchange Server Best Practices Analyzer
Microsoft Operations Manager 2005
In early 2004, the Exchange team began taking a hard look at its support costs and what could be done to address the most common customer pain points. This analysis involved
the development team, the support group, the Microsoft IT department (MSIT), and several other teams within the company.
At this time, we began to formalize a plan for addressing problem areas in a systematic fashion. This involved a number of approaches: fixing problem areas in our currently shipping code, redesigning problem areas in our next release, creating more explicit documentation to cover such areas, and developing tools to better automate troubleshooting and management of such problem areas. While we have made considerable progress in all the approaches over the past couple of years, this article concentrates on tool development work.
Prior to this formalized effort, tool development was ad hoc, with various developers or support engineers writing code in their spare time to simplify tasks they had to do regularly. Gradually, some of these utilities would become more widely used, get more complex, and eventually find their way into customer hands. There was very little management of this process, and it sometimes resulted in confusion—both for internal employees and for customers.
Although this kind of development has by necessity continued, we now also have a more structured approach to tool development. The biggest problem areas in existing Exchange deployments are generally concerned with configuration, disaster recovery, performance, mail flow, and public folder management. We have tried to systematically develop a small set of highly flexible, powerful tools to address these issues.
Exchange Server Best Practices Analyzer
Our initial findings indicated that over half of our support calls had a root cause that was related to product misconfiguration. If you've worked extensively with Exchange Server, you probably know that Microsoft has a wealth of Knowledge Base articles and whitepapers that explain how to customize and tune various parts of the messaging system to meet your specific requirements. However, some of the changes are made using low-level registry or Active Directory® tools. In these cases, the system won't tell you if you've entered a bad value.
Other types of common misconfigurations occur when third-party software has been installed on an Exchange server, but not properly tuned for that particular environment. For example, antivirus settings are a common source of pain if the correct directory exclusions are not in place. Sometimes these changes can cause problems, especially when combined with other environmental factors. The result can be suboptimal performance, limited scalability, and poor availability.
We decided that what we needed was a mechanism to analyze the setup of an Exchange server, allowing us to understand the overall topology where it was installed, so we could provide useful feedback to the administrator, detailing any changes that could be made to improve the health and performance of the server.
To address this need, we began developing a tool that would come to be called the Exchange Server Best Practices Analyzer (ExBPA). The goal was to automate the analysis of an Exchange configuration and apply a set of rules that would look for known misconfiguration issues. We knew two things going in: the Exchange configuration was highly complex and distributed in many different data sources, and these types of rules grow over time as Exchange becomes used in more varied topologies and for more varied purposes.
We concluded that we needed to develop an engine that could process a configuration file, defining what data to gather and what rules to use to analyze that data. This would allow us to add new data points and rules quite easily. Plus, with a tool that is configuration-driven, we could easily update it for customers on a regular basis. We chose XML for the configuration format because it seemed a natural fit. We also knew we needed to make the engine extensible, so we could easily add ways to get at new data sources as needed.
Figure 1 ExBPA Architecture
The development of ExBPA began in earnest in early 2004. By the time we shipped the first version of the product in September of 2004, we had a highly flexible, extensible, and generic engine along with a very simple and well-designed user interface. We also had encoded nearly 1,000 data points being retrieved from over a dozen data sources (including the registry, Active Directory, the IIS metabase, and Windows®
Management Instrumentation). We had approximately 700 rules to analyze that data, and had published several hundred detailed articles on the Exchange Server Tech Center
explaining the meanings of critical rules and what customers should do if they encounter one. Figure 1
shows the architecture of the finished ExBPA solution.
ExBPA was an immediate success, with over 15,000 downloads in the first week (which is a lot for an Exchange Server tool), and it has since been downloaded several hundred thousand times. In the past year, we shipped another major version and three minor revisions of the tool, in addition to publishing monthly updates to the rule set. Meanwhile, we have released or started development on several more tools that address problem areas.
From Problems to Solutions
There are a number of key features that make ExBPA successful. For starters, it's auto-updatable. The first thing the tool does when launched is check for updates over an Internet connection. Functionally, this works very much like a virus scanner update, except ExBPA pulls down an XML configuration file rather than a virus signature file.
We established an Exchange tools newsgroup along with internal and external mailing lists for getting feedback on ExBPA. This is a handy source of suggestions for new features and rules. Combining auto-update capabilities with feedback channels gives us a virtuous cycle, which leads to continuous improvement over time. Of particular use has been our internal feedback list, which gets input from several dozen contributors from the product development and support teams. Based mainly on the suggestions we get from this discussion group, we have been adding new rules at the rate of approximately 50 per month since the tool shipped.
Even a year after our initial version went out the door, we still get two or three new rule suggestions per day. (Did we mention that configuring Exchange properly was a complex business?) The discussion group is also highly useful as a forum for debate—when a new rule is suggested several contributors usually chime in on its merits and whether it should be adjusted for certain situations, like the server role or hardware capabilities.
Another key feature of ExBPA is its ease of use. It should take less than five minutes to get up and running. The tool has a minimum number of prerequisites: .NET Framework 1.1 and optionally IIS common files to access remote metabase information. Rather than installing the tool on a server, you install it on a workstation. It then gathers data through existing remote interfaces. (ExBPA can be installed on the server and run locally, but most customers prefer not to touch servers except when necessary.) The tool itself requires minimal configuration (see Figure 2) and can be started and run with the default settings in three mouse clicks. And it's safe to run at any time since it requires little overhead and does only non-obtrusive operations.
Figure 2 Setting Options for a Scan
The final key feature is the way ExBPA links to detailed articles. When a scan completes, you are shown a report listing all the issues that were found. In the report, each issue is only briefly described. But for each issue, there is a link that will take you to an Exchange Tech Center article that details everything you need to know about the issue and what to do to fix it (see Figure 3). If the computer does not have access to the Internet, the articles will be accessed from the local help file, which gets downloaded along with each new configuration file.
Figure 3 Accessing Detailed Solution Articles
Getting the articles from the Exchange Tech Center has two important advantages. First, the Exchange team can see the top issues customers are running into. (See the sidebar "ExBPA Hit Parade
" to read about the most common issues customers are researching on Tech Center.) We can use the information to adjust future planning or training to make sure customers know about the issues so they can either avoid them in the future or deal with them as quickly as possible (ideally, before the issues cause problems).
The second advantage is that this mechanism ties customers to pertinent documentation in a more direct manner than has ever been done before. When it comes to Exchange, there is no shortage of documentation. But the volume can be overwhelming to a customer, making it difficult to sift through everything and find just those topics that are relevant to the customer's needs.
ExBPA in Action
The first production Exchange topology to be analyzed by ExBPA was the internal Exchange deployment at Microsoft. We went through the output with a fine-tooth comb, looking for tool problems and real misconfiguration issues. Of course, we found both. The Microsoft environment is a great proving ground for the tool, and even to this day one of our release criteria for tool updates is to run a check of our own organization to ensure that the tool still functions correctly.
In the early days, we found a lot of false positives. This is where a rule would misfire because the logic didn't take all operating conditions into account. Some rules needed to be conditioned so that they could understand the version of the operating system, the build of Exchange, the server role, the number of mailboxes on the server, and so on. Implementing the conditioning for each rule really brought home the complexity of systems management.
Many Exchange administrators run the ExBPA once a week to make sure that their Exchange servers are humming along nicely. Others use the tool in conjunction with Microsoft Operations Manager (MOM) 2005 to perform a daily health check. When running ExBPA with MOM, those administrators get an aggregated view of all operational and configuration issues within the Exchange topology.
Common Problems Revealed
We love hearing about how customers have benefited from running ExBPA in their own environments. We came across one administrator who was forced to reboot his Exchange servers every two weeks, otherwise users would get non-delivery messages. This twice-monthly routine had been going on for a number of years and he just accepted this as an inconvenient necessity. By running ExBPA, he discovered that the virtual memory configuration of the server was not optimized and, like a well-used hard disk, the memory address space would fragment each day until it hit a critical point. The tool recommended a simple registry change and the problem was solved—no more biweekly reboots.
In another case, we had a customer report that the tool was malfunctioning because it identified that one of the Exchange servers in his topology was running on Microsoft Virtual PC. After some investigation we found out that the Exchange server was indeed running in a virtualized environment. While this administrator was away on vacation, the IT team experienced issues with the physical hardware and virtualized the environment to get things up and running again. Unfortunately, they had forgotten to tell anyone about this change! Like this guy, many Exchange administrators get quite a surprise when they view the results of their scan.
Here are just a few other examples of conditions ExBPA has detected:
- An admin had enabled circular logging on an Exchange cluster containing over 12,000 user mailboxes. This would have prevented any chance of data recovery from occurring in the event of a major failure. It turns out the customer had enabled circular logging several months earlier to avoid log growth while moving some mailboxes, but forgot to turn it off afterwards.
- One customer was using a WAN connection to service Exchange directory requests. The result, not surprisingly, was very poor overall performance.
- A user's registry value was entered incorrectly (it had been entered in hex rather than in decimal). This problem resulted in the operating system crashing at random intervals.
- In one particular Exchange deployment, the database files were being stored on a compressed volume, causing overall poor performance.
It's been interesting to find that ExBPA is used in various organizations of all sizes. At one end of the spectrum are users running Microsoft Small Business Server (SBS) who use ExBPA as a quick troubleshooter. At the other extreme are our largest Exchange Server customers, with several hundred thousand mailboxes in the organization, using the tool to oversee their vast geographic topologies and to double-check the server configuration that's implemented by local support teams.
ExBPA was created to address one of our top problems in Exchange: misconfiguration issues. Building on its success, we have created or started work on several additional tools to address our other major problem areas.
Microsoft Exchange Server Disaster Recovery Analyzer Disaster recovery is a complex process in Exchange. To address this difficulty, we designed the Disaster Recovery Analyzer to step you through the process of disaster recovery. This doesn't replace the documentation, but it's simple to follow and leads to fewer errors. Our initial version targets the tricky area of log file replay, but we have plans to cover all aspects of disaster recovery in the future.
Microsoft Exchange Server Performance Analyzer This tool attempts to diagnose performance problems occurring on an Exchange server. Such problems are often difficult to locate and usually require expert knowledge to resolve them. The initial version targets the source of server busy pop-up messages that occur on client machines when the server is not performing well.
Microsoft Exchange Server Public Folder Analyzer This tool is already informally available to customers as PFDAVAdmin. It provides a little more functionality for managing public folders than is available in Exchange System Management. We have gone through the tool and have done formal testing and design reviews as well as some general cleanup work to ensure it offers a level of quality comparable to our other offerings.
Microsoft Exchange Server Mailflow Analyzer This tool is still in early design stages. The tool will focus on diagnosing problems with inbound or outbound mail flow. The scenarios that will actually be covered in the initial version are still to be determined.
The analyzer engine used by some of these tools has a lot of similarities to the engine we wrote for ExBPA. While ExBPA does a large-scale blanket check of a system, looking for anything that may be wrong, these tools start with a task or symptom and perform a series of small steps designed to proceed towards a resolution.
The configuration for the analyzer defines these steps; for each step, it defines an action to be taken. This action may consist of displaying a UI to get user input, or it may provide a fragment of ExBPA configuration to get system input. Each step also defines a set of rules. These rules are processed by the analyzer component of ExBPA and the results of those rules determine what step to perform next. In this fashion, we can encode the steps for performing tasks such as disaster recovery or performance troubleshooting in a simpler and more flexible fashion than if we just hardcoded all the logic.
These analyzer engine-based tools will be maintained similarly to the way ExBPA is maintained. We expect to perform regular updates of the configuration that the tool will pick up automatically, and we also plan to have occasional updates to the binaries when there's major new functionality to add. Although the initial versions of these tools will have only a small subset of capabilities to what we eventually want to offer, over time we should be able to round them out and cover all or nearly all the scenarios that give customers problems.
ExBPA continues to be the flagship support tool for Exchange Server. We plan to continue publishing regular updates of the configuration, and there is still no real end in sight as to the number of rules we will be adding over time. The ExBPA engine has also been packaged into an SDK that other product teams can take advantage of to build tools of their own, and several products already do just that.
For the next version of Exchange Server, ExBPA is expected to be more tightly integrated with the product, having a direct link to it from the administration console and possibly shipping with Exchange. There's also the possibility that ExBPA functionality may be built into the Exchange setup to perform sophisticated pre- and post-installation checks.
Jon Avner has been developing messaging software for nearly 20 years. He joined Microsoft in early 1997 and is currently Software Development Engineering Lead for the Exchange Support Tools Development Team.
Paul Bowden has designed communications systems based on cc:Mail, MHS, and X.400. He is currently a Program Manager in the Exchange Support Tools Team, and speaks at shows such as TechEd and IT Forum.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited
In the meantime, you can grab your own copy of ExBPA from the Microsoft Exchange Server Web site (Microsoft Exchange Server Analyzer Tools
) and start checking your Exchange deployment today. Even if you are confident that your Exchange Server is properly set up, there's nothing to be lost by taking this one extra measure—run ExBPA and find out for sure.