Exchange Analyzer Tools Success Stories
For months now, different people at Microsoft® have been writing articles, blogs, and newsgroup posts about how to use the Microsoft Exchange Analyzer Tools to help troubleshoot problems. In case you’ve missed these let me give you a quick summary of what these tools do.
The Microsoft Exchange Best Practices Analyzer is designed for administrators who want to determine the overall health of their Exchange servers and topology. The tool scans Exchange servers and identifies items that do not comply with Microsoft best practices.
The Microsoft Exchange Troubleshooting Assistant can help determine the cause of performance, mail flow and database mounting issues on computers running Microsoft Exchange Server. The tool automates specialized troubleshooting steps for identified symptoms.
In addition to being available for download, both of these tools will be in the Exchange Server 2007 Toolbox, accessed from the Exchange System Manager.
There is a lot of enthusiasm around these tools and we’d like it to be infectious. This month, I’ll tell you about some support cases in which Microsoft Product Support Services used the Exchange Troubleshooting Assistant components, and next month I’ll talk about some Exchange Best Practices Analyzer success stories. I’d like to share with you some of the Product Support Services success stories in using these tools to help troubleshoot support requests.
Please consider that these examples are taken from cases in which we in Product Support Services were not only learning how to use these tools ourselves, but even becoming aware of their existence. That’s why you’ll see some comments that say “after days of trying to solve the problem, I ran the Analyzer tool and solved the problem immediately.” In these cases, the Support Engineer was probably just learning of the tool’s existence.
Exchange Troubleshooting Assistant
The Exchange Troubleshooting Assistant (ExTRA) consists of the following three components.
- Exchange Performance Troubleshooting Analyzer (ExPTA)
- Exchange Mail Flow Analyzer (ExMFA)
- Exchange Disaster Recovery Analyzer (ExDRA)
When you open the Microsoft Exchange Troubleshooting Assistant you have a choice of choosing between Performance Troubleshooter, Mail Flow Troubleshooter, or Database Recovery Management. I’ve grouped these success stories by the ExTRA component that was used to help in troubleshooting the problem.
Performance Problems
Performance problems can be the most troublesome and time-consuming problems to fix. Support Engineers do not generally take one case and work on it exclusively until it is completed. In most cases there is some back-and-forth between the customer and the Support Engineer as they try different solutions and check different settings. Performance cases can take weeks to resolve.
The Performance Troubleshooting Analyzer has had a big effect on how long it takes to resolve these problems. Without using the Exchange Troubleshooting Assistant, it takes an average of 42 minutes to troubleshoot a problem. With the Exchange Troubleshooting Assistant, that average time is reduced to 12 minutes.
The following table shows the effect that ExPTA has had on troubleshooting performance problems. The times depicted are the amounts of time that the Support Engineer recorded working on a case.
Times or Cases | Using ExPTA | Manual |
---|---|---|
Average time |
12 minutes |
42 minutes |
Minimum time |
4 minutes |
3 minutes |
Maximum time |
1 hour 16 minutes |
8 hours 23 minutes |
Median time |
10 minutes |
33 minutes |
Solution Delivered First Contact |
12 cases |
3 cases |
Unresolved cases |
1 case |
19 cases |
One thing that really stands out to me is that the maximum time to solve a case using ExPTA was only about 30 minutes longer than the average time to solve a case without ExPTA. The other thing that really stands out is the number of times that we can solve the problem the first time that the customer and the Support Engineer make contact. I think that this shows that as the ExTRA is used more frequently before calling Product Support Services, we’ll see the number of performance cases decrease.
Symptom Outlook RPC pop-up box
Root Cause Client Restrict operation
Comments from the Support Engineer The ExPTA report pointed to the client’s use of the Restrict operation that is part of the process to request that Exchange creates a view on a folder or set of folders (effectively, a database table with associated criteria). If the view on the folder or set of folders already has a matching restriction, Exchange uses the existing view to satisfy the user request. If a view does not have a matching restriction, Exchange creates a new view. Creating a view is more costly than using an existing view. The issue was isolated within two days of ExPTA being run.
Symptom Outlook RPC pop-up box
Root Cause High Database Average Read and Write times
Comments from the Support Engineer I used ExPTA when onsite in front of the customer. Having the customer see this output in real time was excellent, as typically you would have to collect the perfmon data, manually examine it and compare each counter with our recommended thresholds in the performance white paper and then present this back to the customer. This tool helps save a massive amount of man hours that are typically lost to isolating performance-based issues.
Symptom Outlook RPC pop-up box
Root Cause SAN disk latencies
Comments from the Support Engineer I requested that the customer run ExPTA as soon as I got the case, and the problem was isolated in one day.
Symptom Outlook RPC pop-up box
Root Cause Disk bottlenecks on the Exchange database server drive
Comments from the Support Engineer I requested that the customer run ExPTA as soon as I got the case, and the problem was isolated in three days.
Mail Flow Problems
Unfortunately, I do not have the same data showing actual time savings for mail flow and database recovery issues that I do for performance issues. Overall there are fewer of these cases to choose from, and we have not had a chance to run the same study that we did for performance issues.
Symptom Mail was queuing up over the routing group connector between two routing groups
Root Cause FQDN of remote servers was incorrect
Comments from the Support Engineer All four Exchange servers in the first RG had queues to both servers in the second RG. In the application log, you got event ID 4000 "unable to bind to the remote destination server in DNS." ExMFA helped us resolve this issue in a timely manner. After we corrected the FQDN on all SMTP servers involved and restarted the Routing service, the mail queues cleared.
Symptom Mail stuck in post categorizer queue, InetInfo at 99 percent
Root Cause Non-standard SMTP sinks
Comments from the Support Engineer ExMFA enabled us to easily see non-standard SMTP event sinks on the server. It pointed us to the root cause of the problem.
Symptom Mail stuck in queue
Root Cause Default SMTP domain name change
Comments from the Support Engineer ExMFA identified the name change for the default SMTP virtual server domain.
Database Recovery
Symptom Mailbox and public folder stores dismounted and could not mount successfully
Root Cause E00.log was missing
Comments from the Support Engineer I started by asking the customer to download ExTRA tool and to run DRA wizard. He sent me the XML output and it was clearly stated that E00.log is missing and required for the stores to mount successfully. ExDRA suggested locating the missing log, restoring from backup, or repairing the databases.
We located the missing log file in the Antivirus quarantine folder and restored it to its original path. The stores mounted successfully and the whole recovery took less than an hour.
ExDRA had saved time and effort in analyzing DB headers and finding the missing logs. It also provided several suggestions to recover from disaster. The customer was very happy with ExDRA and decided to use it and its other functions for all future issues that may occur.
I will use ExDRA in all DR cases to help more customers recover disasters faster than before.
Symptom Database not mounting
Root Cause Corrupted log file
Comments from the Support Engineer ExDRA gave us the correct options with which to proceed. The server was up and running on the same day
Symptom Database not mounting
Root Cause Log file sequence has reached the limit (E00FFFFF.log)
Comments from the Support Engineer ExDRA gave the correct options with which to proceed. The server was up and running within 50 minutes.
For More Information
As you can see, we’ve found the Exchange Analyzer tools to be very valuable in Exchange troubleshooting. So the next time that the Exchange Server starts acting up, give these tools a try before you call Product Support Services. At a minimum, you’ll save some time when you can give us the reports when you open the Support Request. At best, you’ll find the problem and save yourself time and money getting it resolved.
Stay tuned for next month's Exchange Best Practices Analyzer success stories.