Internal SQL Server Diagnostics Tools, Part 1: PSSDiag
December 7, 2004
Summary: Ken Henderson explains how to use PSSDiag, a diagnostics collection tool for SQL Server. PSSDiag is one of several Microsoft Product Support Services tools recently released to the Web that are now available for personal use. (26 printed pages)
In the next few columns, I will cover some tools that Microsoft Product Support Services uses to troubleshoot and solve complex SQL Server-related problems. These have recently been released to the Web and are available for customer use outside the context of a support incident. This month's column will focus on PSSDiag, a diagnostics collection tool for SQL Server. In subsequent columns, I'll talk about Read80Trace, OStress, and perhaps one or two others.
PSSDiag is used by Microsoft Product Support Services (PSS) to collect diagnostic data. There are two main varieties: the internal version often sent to customers within the context of a support case, and the external version, recently made available for download from the Microsoft Web site. The two versions are built from the same code base, but differ a bit in terms of exposed features, default options, intended use, and so forth. Since you can freely download the external version, I'll keep the discussion on it in this column.
PSSDiag consists of two main components: the configuration application, DiagConfig.exe, and the collector service, PSSDiag.exe. DiagConfig.exe is a managed code application that creates configuration files or registry entries that PSSDiag.exe, a native code service application, reads to determine what diagnostic information to collect and which machine(s) to collect it from. PSSDiag.exe collects the requested information and logs it to a specified output folder, optionally compressing it in the process. This output can then be uploaded to Microsoft or used by customers for monitoring the health of their servers and troubleshooting server problems.
Although the tool can be configured to collect diagnostics from virtually any type of machine, it is configured specifically for SQL Server-related diagnostics by default. Collecting from other types of machines is as simple as supplying a special startup parameter to the collector, as I'll discuss in the next section.
DiagConfig begins by prompting for a target machine and diagnostic collection service, as illustrated in Figure 1.
Figure 1. DiagConfig connection dialog box
The target machine is the one from which you wish to collect diagnostic info. This need not be the local machine, but must have the PSSDiag collector service installed (this is normally handled automatically for you; see the Installing the Collector Service section for instructions on how to do this manually).
The Diag Collection Service entry specifies the name of the collector service on the target machine. This defaults to PSSDIAG, but can be changed (you can install multiple instances of the collector service on the same machine). Setting up multiple instances of the collector service would be handy, for example, if you wanted to set up one type of collection that ran around the clock gathering low-impact diagnostics and another more invasive collection that you ran ad hoc when troubleshooting specific problems. The name you specify here must already have been registered on the target machine as a collector service unless you are connecting to a default instance on the local machine. Again, see the Installing the Collector Service section for more info.
The ability to control and configure the collector service on a remote machine allows you to centrally manage diagnostic collection for the servers across an enterprise from a single machine. You can even specify the target machine name and collection service name on the DiagConfig command line to bypass the Target Machine dialog and automate the connection process. These parameters are positional and require no command-line switches; specify the target machine as the first parameter, and the target service name as the second one. This allows you to create shortcuts to your target machines in Windows Explorer and automatically connect to them by simply clicking the appropriate shortcut.
Once you've supplied the required connection info, DiagConfig presents the main configuration screen. Figure 2 illustrates:
Figure 2. DiagConfig main configuration screen
The main configuration interface is divided into four quadrants: Connection Info, Machine-wide Diagnostics, Instance-specific Diagnostics, and Instructions/Output. I'll cover each of these separately.
The Connection Info section allows you to further refine the target machine you've specified to a specific SQL Server instance on the machine. This defaults to an asterisk (*), which will instruct the collector service to enumerate the SQL Server instances installed on the target machine and collect from all of them simultaneously. You can change this to a particular instance on the target machine if you like. You can also specify what type of authentication to use when connecting to the target machine and an optional user name if using SQL Server authentication.
As the name implies, the Machine-wide Diagnostics section allows you to configure the collection of diagnostics that are machine-wide in scope—ones that are not specific to a particular SQL Server instance. These include the target machine's event logs (system, application, and security) as well as Perfmon/Sysmon logs.
For event logs, you can control whether they're collected when the collector starts, when it shuts down, or both. For Perfmon logs, you can configure the objects and counters that are collected (a shaded counter object in the treeview indicates that some, but not all, of the Perfmon counters under it are selected), the rollover file size, and the sampling interval. The counter list that's displayed here is actually loaded from an XML document (see the PerfmonTemplates subfolder), and you can customize it to suit your needs.
One thing that's worth mentioning here regarding Perfmon counters is that PSSDiag supports the notion of instance counters. These are Perfmon counters that you want to collect for all the SQL Server instances on a machine. These counters will have a %s token embedded in their text. This token will be expanded by the collector service and replaced with the name of each SQL Server instance on the target machine. It will cause the collector to behave as though you had requested the same counter for each of the SQL Server instances individually.
Another thing worth mentioning is PSSDiag's support for Perfmon object instance wildcards. These counters have an asterisk (*) used in place of a specific object instance in their counter text. This allows you, for example, to collect a given counter or set of counters for all the processes on a target box without knowing what those processes are in advance.
Instance-specific Diagnostics are diagnostics that pertain to a particular SQL Server instance. These include SQLDIAG reports, blocking troubleshooting output, and Profiler traces.
SQLDIAG is a utility that ships with SQL Server (check the Binn folder) that takes a snapshot of various system configuration and troubleshooting information specific to a particular SQL Server instance (e.g., error logs, sp_configure values, etc.). As with event logs, PSSDiag can be configured to collect SQLDIAG reports at collector startup, shutdown, or both.
PSSDiag can also automatically run and cycle the standard stored procedure that Microsoft uses to analyze blocking issues (see How to monitor SQL Server 2000 blocking). You can configure which parameters to pass into the procedure and what sampling interval to use.
The final instance-specific diagnostic that DiagConfig allows you to configure is Profiler tracing. It presents a simplified version of the SQL Profiler interface, and allows you to select which trace events to collect (a shaded trace event type indicates that some, but not all, of the trace events under it are selected), which trace template to use (these are XML documents that you can customize located in the TraceTemplates subfolder), and what rollover file size to use.
The last section of the main configuration screen consists of two parts: a set of basic instructions that describes how to get started with the tool, and an output window that displays the collector service's console output once you've started it. This is actually just a view of the target machine's event log that has been filtered to show only those events pertaining directly to the collector service and the output it has written to the log since it started collecting data.
Once you've configured the diagnostics you want to collect, click Start to begin collection. DiagConfig then presents a startup parameters dialog that allows you to supply parameters specific to the diagnostic collection you're about to initiate. Figure 3 illustrates:
Figure 3. DiagConfig Startup Parameters dialog box
There are numerous startup parameters that you can specify here for the collector service. I'll go through each one of them.
The Mode setting allows you to specify whether you'll be collecting from a machine that has SQL Server installed (and to which you can connect) or one that does not. When SQL Server mode is chosen, the collector begins by ensuring that it can connect to the target SQL Server instance and that it is a member of the sysadmin group within that instance. If either of these checks fails, the instance will be ignored for purposes of diagnostic collection. This means that if you accept the default target instance name of * (which causes the collector to enumerate the SQL Server instances installed on the target machine, as I said above), any instances that are not running at the time of collection or that have other connectivity problems will not have diagnostics collected for them.
If Mode is set to Generic, no SQL Server connectivity check is made. If you have selected SQL Server-specific diagnostics (e.g., a Profiler trace), the collector service will still attempt to collect them, but may not be able to if connectivity issues occur when trying to reach the server, or if you lack sufficient rights on the server. This mode is obviously useful for collecting diagnostics from non-SQL Server machines. It is also useful for collecting limited diagnostics from SQL Server machines that have installation or connectivity issues (e.g., you may want to collect a Perfmon log from a machine that is not allowing you to connect to SQL Server itself for some reason).
As I'm sure you've guessed, the Output Folder specifies the folder to which the collector will direct all of its diagnostic output. This folder must either already exist on the target machine or be creatable (which means that if it is a subfolder, the parent folders must already exist). If left unspecified, this defaults to a subfolder under the collector startup directory named OUTPUT. The drive on which this folder resides should have plenty of space and, ideally, should not be used by your SQL Server in any way (e.g., for data or log files). Exactly how much space you need varies based on which diagnostics you're collecting and how much activity is occurring on the target machine, so there's no way for the tool to know in advance whether you have enough.
You can also control how the collector service manages an existing output folder when it starts. You can instruct it to simply overwrite the contents of the folder (the default), or to clear or rename it before starting the new collection. Renamed output folders have the format Output_00001, Output_00002, and so forth, with the highest number being the most recently renamed.
The Compression setting controls whether the collector service compresses the diagnostic files it collects, either as it collects them (NTFS) or once it shuts down (Zip). When you select NTFS compression, the collector service starts a low-priority background thread that compresses the diagnostic files it has collected using NTFS compression (provided the output drive is an NTFS partition). This thread wakes up every 30 seconds and checks for new files that have not yet been compressed. When you select Zip compression, the collector compresses the files in the Output folder into a single Zip file (PSSDiag.zip, by default) when it shuts down. It begins by adding all internal diagnostic files (files specific to PSSDiag itself, not those you've requested—these have a prefix of ##) to a Zip file named PSSDiagInternal.zip. It then adds this file, along with all of the output files from the diagnostics you've requested, to a file named PSSDiag.zip (the name of the file is based on the name of the collector service, which can be changed, as I've mentioned).
The Scheduling section allows you to schedule how and when the collector service runs and shuts down. You can configure a Start Date/Time, an End Date/Time, and whether the service should automatically restart itself and/or collect only a snapshot of data and automatically shutdown. The times you specify can be relative or absolute, and you can mix the two (e.g., you can start at an absolute time, then shutdown at a relative one). A relative end time is always relative to the start time, not the current time, while a relative start time is always relative to the current time. If you specify an absolute date/time in the past, the collector will automatically adjust its date such that it occurs in the future. Depending on the time specified and the current time of day, that will either be today's date or tomorrow's date.
When the collector is waiting on a given time or time period to pass before beginning data collection, it uses absolutely no CPU resources on the host machine. It tells Windows what time it needs to wake up, and sleeps until Windows awakens it. When the start time is finally reached, Windows signals the collector that it's time to start collecting data, and diagnostic collection begins as usual.
Selecting the Run Continuously option tells the collector to automatically restart once it has shutdown due to the end time being reached. This is useful for running a particular diagnostic on a daily basis. Since you can specify relative times and times without dates, this feature would allow you, for example, to collect the same diagnostic data between 9 a.m. and 5 p.m. everyday. After shutting down at 5 p.m., the collector service will automatically restart itself and wait until 9 a.m. the next day to begin collecting data again.
Although PSSDiag runs a stored procedure to collect SQLDIAG reports by default, using SQLDIAG.exe instead is still supported for those who prefer it. With one notable exception, the stored procedure PSSDiag uses by default collects the exact same data as SQLDIAG.exe (actually a superset of it) and formats the data identically. The only significant difference between the two is that SQLDIAG.exe also collects an MSInfo32/WinMsd report, while PSSDiag's stored procedure does not. This is due to the fact that PSSDiag can be configured to collect MSInfo32/WinMsd separately via its custom diagnostics feature, as we'll discuss below. Collecting MSInfo32/WinMsd separately allows it to be collected in parallel with the SQLDIAG report's stored procedure. This results in a performance improvement overall and a shorter amount of time spent collecting the report. Given that MSInfo32/WinMsd can take a substantial amount of time to collect, there is a significant advantage in leaving this setting at its default rather than using SQLDIAG.exe to collect the report.
Once you click OK, the configuration information and startup parameters you've specified are written to the registry (in XML format) on the target machine, and the service is started. If you're collecting from the machine on which you're running the configuration application, DiagConfig will also automatically register the service for you before attempting to start it.
After the service starts, DiagConfig displays its output in the Output page of the Instructions/Output section of the GUI. Figure 4 illustrates:
Figure 4. DiagConfig displays the output for the PSSDiag collector service.
This output display is updated every 5 seconds and simply displays the portion of the target machine's application event log that pertains to this particular run of the collector service. You can also use the Windows Event Viewer tool to view the same info, although I think you'll find DiagConfig much easier to use for this particular purpose.
While the service is running, you won't be able to change any of the configuration or startup parameters. Stop and restart the service if you wish to alter the diagnostics you're collecting.
Keep in mind that you can minimize DiagConfig or even exit it altogether while the collector service runs. DiagConfig creates a tray icon so that you can control the service via the system tray if you like.
Note that you don't have to use DiagConfig to start and stop the collector service. You can also use Windows' Service Control Manager application or the DiagControl command file that's included with the collector. If you start the service via DiagConfig, then stop it through some other mechanism, DiagConfig will detect this and reflect the appropriate status and output in the user interface it presents. For reasons I won't go into here, you should not use the Service Control Manager application to stop the collector service. Instead, use either DiagConfig or the DiagControl command file.
The Collector Service
In this section, I'll talk more about the collector service, how it works, and how you can use it. If you're using DiagConfig to control and configure the service, you won't normally need to interact directly with the service, but it's still important to understand how it works.
The collector service is a native-code application that can be run as either a console application or a service. DiagConfig always runs it as a service. This allows it to be configured and controlled on remote machines and allows the user to log out of the host machine without stopping diagnostics collection (all non-service applications are stopped when a user logs out).
Regardless of how it is started, the collector always writes significant status information to the host machine's application event log (the Source column in each log record will be set to the service's name—"PSSDIAG" by default). If started as a console application, it also writes this information to the system console. Any errors or other critical messages will be displayed in red.
Additionally, all status messages and all internal diagnostic messages are written to the collector's log file, ##PSSDiag.log, by default. This file will contain a superset of the status information written to the console and application log and is generally used by Microsoft Product Support Services to troubleshoot issues related to PSSDiag itself.
Output from child processes spawned by the collector service (e.g., custom diagnostics) that is not otherwise redirected is trapped in a file named ##console.log. Because the Windows command processor defaults to ANSI output, this file is an ANSI file, even though PSSDiag itself is a Unicode application.
Once the collector starts, it reads the configuration information and startup parameters that have been passed to it. If running as a service, it gets its startup parameters from the registry. If running as a console application, it gets them from the command line.
PSSDiag supports reading its configuration information from three different sources: an INI file, an XML document, or the registry (you can specify a configuration file via the /I startup parameter). If a configuration file is not specified in the startup parameters, the collector first checks for PSSDiag.ini and loads that file if it exists. The INI format is a simple name-value pair file and looks something like this:
[.] SetupVer=22.214.171.124 CaseNumber= SSVer=8 Instance=* WindowsAuth=1 Username= SQLDiag=0 SQLDiagStartup=0 SQLDiagShutdown=1 EventLogs=0 EventLogsStartup=0 EventLogsShutdown=1 Profiler=1 ProfilerEvents=76,75,92,94,93,95,16,22,21,33,67,55,80,61,69,25,59,60, 27,58,14,15,81,17,10,11,35,36,37,19,50,12,13 ...
The format is pretty straightforward: a 1 or 0 specifies the value of each Boolean switch; settings with multiple values are either separate INI keys or are comma-delimited, etc.
The INI section heading ([.] above) represents the target machine name, and you can set up a single file that targets multiple machines, each with its own section in the INI. This is a powerful feature that allows you to easily collect diagnostics from multiple machines with a single instance of PSSDiag. When the collector starts, it enumerates the machines listed in the configuration file and collects from all of them simultaneously.
DiagConfig works exclusively with XML configuration files and the registry; it does not read or write INI configuration files. You will typically only see these from Microsoft Product Support Services engineers. Because PSSDiag.ini is the default configuration file name, this provides a convenient mechanism for Microsoft Product Support Services to temporarily override the diagnostics it needs the collector to gather without affecting any diagnostics you may have configured. Once your support incident is closed, you can safely delete this INI file so that the collector service will read the configuration values stored in the registry by DiagConfig.
If PSSDiag.ini is not present, the collector next looks for PSSDiag.xml. The PSSDiag XML configuration file format is semantically equivalent to its INI format and looks something like this:
An XML schema, pssdiag_schema.xsd, is used to ensure that the format is matches what the collector is expecting. If you edit a PSSDiag XML configuration file with an XML editor, this schema will protect the file against changes that would confuse or break the collector.
The main advantage of the XML format over the INI format is that using the XML format allows you to collect different diagnostics for the different SQL Server instances installed on a given machine. With the INI format, you must collect the same diagnostics from all of the instances that you target on a particular machine. You can choose whether to collect from a single instance or from all of them, but you must collect the same diagnostics from all of those you select. That's not true of XML format. Because each instance gets its own section in the XML document, each can have a distinct diagnostics configuration.
As I've mentioned, DiagConfig works exclusively with the XML configuration files and registry configuration. As a rule, you'll work only with XML configuration information when interacting with the collector service unless you have engaged Microsoft Product Support Services to assist with troubleshooting an issue.
If PSSDiag.xml is not present, the collector next checks the registry for configuration information. DiagConfig stores the configuration information you specify within it to the registry in XML format under the key HKLM\Software\Microsoft\SQLDiagEx\PSSDiag. The format rules for registry configuration settings match those for XML configuration files, and you shouldn't modify this information directly unless asked to by Microsoft. Use DiagConfig if you need to alter the collector's configuration information.
Because DiagConfig always writes configuration information to the registry and an INI or XML configuration file takes precedence over registry configuration information, you'll want to make sure that there isn't an XML or INI configuration file present when using DiagConfig to configure data collection. As I said above, the easiest way to do this is simply to delete any configuration files you receive from a Microsoft Product Support Services engineer once your support incident is closed.
As with INI configuration files, you can configure the simultaneous collection of diagnostics from multiple machines using the XML configuration format. This simply amounts to setting up multiple <Machine> tags in the XML document and supplying the requisite sub-elements and attributes. The easiest way to do this is to copy and paste the <Machine> node section for one machine in order to create multiple Machine sections.
While I'm on the subject, another point worth making here is that you can also collect diagnostics from multiple machines with a single collector instance without setting up multiple configuration file sections. The collector supports the use of an asterisk (*) for the machine name, which will be evaluated differently during collection based on the mode in which the collector is running.
You'll recall from the discussion above that the Mode switch controls whether PSSDiag enables certain SQL Server-specific connectivity and permissions checks. It also affects how * is evaluated when specified for the target machine name. In SQL Server mode, * will cause the collector to enumerate all of the visible SQL Servers on the current domain. These servers must be running and broadcasting themselves as SQL Servers. In Generic mode, using * for the machine name will cause the collector to enumerate all of the servers visible on the current domain, regardless of whether they're running SQL Server, and attempt to collect diagnostics from them. So, you can use this feature to easily collect the same diagnostics from every SQL server or even from every machine on the collector's host domain.
Another way to collect the same diagnostics from multiple machines is to use a machine list file (use the /M startup parameter for this and prefix the file name with @—e.g., /M @myservers.txt). A machine list file is simply a text file with the machines from which you want to collect diagnostics listed on separate lines. Specifying a machine list file causes the machine name specified in the configuration file to be overridden with those specified in the file and diagnostics to be collected from all of them simultaneously. If you want to override the machine name specified in the configuration file with a single machine name, simply specify it via the /M startup parameter (e.g., /Mmymachine).
The collector service features special support for gathering diagnostics from Windows clusters. When running on a cluster node and when (the default) has been specified for the machine name, the collector service will detect that it is running on a cluster and, if in SQL Server mode (versus Generic mode), will retrieve the virtual SQL Servers installed on the cluster and collect from all of them simultaneously. This means that if there's just one virtual SQL Server, you do not need to do anything special for the collector to recognize and properly collect diagnostics from it. If you want to collect from just one virtual SQL Server among many on the same machine, specify that virtual SQL Server for the target machine name.
The Collection Process
Once the configuration information has been processed, the collector service begins collecting the diagnostics you've requested. For each machine section in the configuration info, the collector starts a separate worker thread and collects from all machines simultaneously. All of the diagnostics for a particular machine are collected using just one worker thread. This means that if you only collect diagnostics from a single machine (which is the default), the collector will create just one worker thread to collect the requested data. This design allows PSSDiag to scale almost linearly and collect data from multiple machines at once in an efficient manner.
Note that this design doesn't mean that all diagnostics for a particular machine must be collected in a serial, synchronous fashion. On the contrary, most of them are collected in parallel. Since we are only using a single worker thread for each machine, how is this possible? It's possible because shelling to other tools collects all of the diagnostics, except for Perfmon logs. Often this is the SQL Server OSQL utility (SQLCMD on SQL Server 2005), but sometimes it's the Windows command processor or some other executable. By allowing these other tools to do most of the work of collecting diagnostics, the collector assumes more of a supervisory role, a task that it can easily handle with just one worker thread per machine. It monitors the processes it spawns and waits for them to complete as appropriate, often waiting on several of them at once.
During the collection process, each diagnostic that you've requested will be collected as specified and will have its output routed to the Output folder. The lone possible exception to this is the Profiler trace. If you are running the collector service on one machine and collecting a Profiler trace from another, the Profiler trace will be written to the remote machine. This is due to the fact that the collector uses SQL Server sp_trace_* extended stored procedures to collect Profiler traces, and those procedures always run within the context of the host SQL Server. This means that the output folder you specify must exist on both the machine on which the collector is running, as well as the remote SQL Server machine. So, if you set the output folder to "C:\PSSDiag\Output", it must exist on both machines. During the shutdown process, the collector will attempt to use an admin share on the remote system to copy the trace files back to the Output folder on the local machine, so that all of your diagnostic output is in one place and can be compressed into a single file if you have enabled Zip compression.
Note that while the collector supports the notion of collecting data from remote machines, that's not how it works by default, and you shouldn't use this facility unless you actually need it. A much easier way to collect data from remote machines is to install the collector service on each of them, then use DiagConfig to manage your interaction with it. DiagConfig does not directly support collecting diagnostics from a machine that has not had the collector service installed on it.
The Shutdown Process
Once the collector service has started collecting data, it continues to do so until you stop it or until it hits a shutdown condition (e.g., it reaches a previously specified end date/time or stops automatically because you have requested a diagnostic snapshot only, etc.). When running as a service, the best way to stop the collector is to click Stop in DiagConfig. You can also run DiagControl STOP in the collector's startup folder, or use the Service Control Manager to stop it. For reasons I won't go into here, you should use DiagConfig or DiagControl rather than the Windows Service Control Manager when possible, as I mentioned earlier.
When the collector shuts down, it stops all diagnostics it has started. This means that it stops any Profiler traces it has previously started, it kills any T-SQL scripts it currently has running, and it terminates any sub-processes it has spawned (including any they have spawned). Once all diagnostics have stopped, the collector will compress its output into a Zip file if you've requested it to, then exit.
If you specified the Run Continuously startup parameter, the collector will restart just prior to exiting, provided that it has stopped due to having reached a specified end time. It will behave as though it has just been restarted manually. As I mentioned earlier, this allows you to easily collect a given set of diagnostics on a daily basis without actually having to start and stop the collector yourself.
Normally, you will not need to install the collector service yourself. When you use DiagConfig to manage the service on a local machine, it will take care of registering the service as necessary. There are, however, situations where you may want to register it manually, so I'll provide some instructions on how to do this. You might want to do this, for example, if you're going to install the service on several machines and manage them all from a single machine using DiagConfig. You might also manually register the service if you need to install the collector on a server that does not have the .NET Framework installed. Since DiagConfig is a managed code application, you will have to either install the .NET Framework prior to running it or register and interact with the collector service without the benefit of DiagConfig.
First, the collector executable, PSSDiag.exe, can register itself as a service via the /R command-line parameter (/U deregisters it). Any parameters you specify on the command line when registering will be preserved for future runs of the service. Deregistering the service not only removes the service, it also deletes the PSSDiag-specific registry keys, making an uninstall application unnecessary. The PSSDiag uninstallation process simply amounts to deregistering the service and deleting the folder containing the binaries and support files.
Second, you can control the name of the collector service via the /A command-line parameter. Use this option to install multiple instances of PSSDiag on a single machine. If left unspecified, the default service name is PSSDiag, but you can use any name you like. All output files, messages, etc., will be tailored to whatever name you've used. For example, this allows you to have one instance of the collector service that runs around the clock collecting a low-impact set of diagnostics, and another that you run as needed to collect a more intrusive set. If you elect to install multiple instances of the collector service on the same machine, a best practice is to copy the PSSDiag binaries and support files to a separate folder for each instance, naming each folder after the instance name you intend to use. This alleviates having to remember to specify different output folders for each instance (since the default output folder is the OUTPUT subfolder under the service's startup directory) and helps prevent other types of potential file access conflicts.
With so much functionality already built-in, the next obvious step would be to support the collection of custom diagnostics so that you could collect whatever diagnostics you like, not just the ones built directly into the collector service. Fortunately, the collector provides a rich custom diagnostic facility, and, in fact, this facility is how some of the built-in diagnostics are collected.
The DiagConfig configuration application does not currently support the creation of configuration information containing custom diagnostics, but both the INI and XML configuration formats support custom diagnostics, and DiagConfig will preserve custom diagnostics you've configured in an XML configuration file that it reads. I'll explain more about this and how to add custom diagnostics to the configuration information that DiagConfig writes to the registry in just a moment.
Since you will normally only be working with XML configuration files, I'll show you what custom diagnostics look like in the XML format, then I'll explain what each piece signifies:
<Collector type="customdiagnostic" point="startup" groupname="Cluster Diag" taskname="Get PSTAT" customdiagtype="utility" wait="yes" cmd="`PSTAT50.exe > "%OUTPUT_PATH%%COMPUTERNAME%_PSTAT.TXT"`" />
The custom diagnostic node above adds a custom diagnostic named "Get PSTAT" to a diagnostic group named "Cluster Diag." Diagnostic groups are for informational purposes only and have nothing to do with how or when diagnostics are actually collected.
The point attribute configures the point in time during the collection process that the custom diagnostic is to be collected. The custom diagnostic above is configured to be collected when the collector starts, but can also be configured to be collected during the collector shutdown process.
This particular diagnostic has a customdiagtype of utility. There are several types to choose from:
Table 1. Custom diagnostic types
|Utility||An executable, .BAT, or .CMD file|
|TSQL_Script||A Transact-SQL script|
|TSQL_Command||A Transact-SQL command or stored procedure (e.g., sp_who)|
|Reg_Query||A registry query—returns the value of an entry in the registry|
|Reg_Save||Saves a particular registry entry as a .HIV file|
|Copy_File||Copies a file on the target system to the Output folder|
|Type_File||Lists a file on the target system, optionally redirecting it to a file in the Output folder|
|Tail_File||Dumps the last n bytes of a file to a file in the Output folder|
|VB_Script||An ActiveX script|
The wait attribute indicates whether the collector service will wait on the task to complete before continuing with other diagnostics. It can have one of three values: Yes, No, and OnlyOnShutdown. The first two are self-explanatory; OnlyOnShutdown tells the collector not to wait for the task to complete when it is first spawned, but to do so if the task is still running when the collector shuts down. This allows you to run lengthy custom diagnostic tasks in parallel with the rest of the collection process and wait on them to complete when the collector is stopped.
The use of the cmd attribute varies based on the type of custom diagnostic. It can often be more than just a file name or registry path—it can usually consist of a full command line for the custom diagnostic in question (e.g., command-line switches, redirection, and piping, etc.). Table 2 summarizes the various uses of the cmd attribute:
Table 2. The significance of the cmd attribute for each custom diagnostic type
|Type||Meaning of the cmd attribute|
|Utility||The name of an executable, .BAT, or .CMD file (specify the output file name via redirection, as illustrated above)|
|TSQL_Script||The name the Transact-SQL script. If this file has a .TEM extension, PSSDiag will translate embedded meta variables before running it. The output file name will be based on the script name|
|TSQL_Command||The Transact-SQL command or stored procedure (e.g., sp_who). The output name will be based on the group name, task name, the target server (and instance, as applicable), and the point at which the command was executed (startup or shutdown).|
|Reg_Query||The registry value to return (specify the output file name via redirection)|
|Reg_Save||The registry key to save (specify the output file name via redirection)|
|Copy_File||The names of the source and target files (surround file names containing spaces with double-quotes)|
|Type_File||The name of the file to list (specify the output file name via redirection)|
|Tail_File||The name of the file to list (specify the output file name via redirection)|
|VB_Script||The name of an ActiveX script (specify the output file name via redirection)|
As pointed out by the table, you use redirection to capture the output of many of the custom diagnostic types. You can use the %output_name% meta variable (discussed below) to make this very easy:
<Collector type="customdiagnostic" point="startup" groupname="Cluster Diag" taskname="Get PSTAT" customdiagtype="utility" wait="yes" cmd="`PSTAT50.exe > "%OUTPUT_NAME%.txt"`" />
The cmd attribute is delimited with backquotes (or graves – the ` character) so that it can contain other types of quotes. Because it is an XML attribute, it is also wrapped in double-quotes. Obviously, the INI format does not require (or even allow) these outer double-quotes.
If you use a file name or path that contains spaces (or that may contain spaces), be sure to enclose it in double-quotes, as demonstrated above. Because the Windows command processor will often receive the file name, path, or command you've specified as-is, it's important to follow the Windows rules for file paths.
In case it isn't obvious, any files that your custom diagnostic requires (in this example, PSTAT50.exe) should either be available in the system path or copied to the collector's startup folder. Any PSSDiag packages received from Microsoft Product Support Services will already include all of the files required to collect the requested diagnostics.
If you look closely at the example above, you'll notice the use of what appear to be %OUTPUT_PATH% and %COMPUTER_NAME% environment variables. Actually, only the second one, %COMPUTER_NAME%, is really an environment variable. The collector to the full path of the Output folder translates %OUTPUT_PATH% internally, and includes a trailing backslash. In PSSDiag parlance, these are meta variables—pseudo environment variables that are translated internally by the collector. There are several of these built in to the collector:
Table 3. Built-in meta variables
|%output_path%||The full Output folder path, including a trailing backslash|
|%server%||The name of the target machine|
|%instance%||The name of the target SQL Server instance—this will expand to all configured instances for a particular machine|
|%server_instance%||The target machine and SQL Server instance—this will expand to all configured instances for a particular machine|
|%startup_path%||The full path of the collector service's startup folder, including a trailing backslash|
|%ssuser%||The user name being used to connect to SQL Server|
|%sspwd%||The password being used to connect to SQL Server if SQL Server Authentication is being used. (For security reasons, do not store this password to disk or persist it in any form. It is intended for use on the command line for tools that require connectivity to SQL Server (e.g., OSQL) when SQL Server authentication is being used.)|
|%sspath%||The full path to SQL Server's root installation folder, including a trailing backslash|
|%ssregroot%||The root registry key for SQL Server|
|%authmode%||A 1 or 0 indicating whether Windows Authentication is being used to connect to SQL Server|
|%ssver%||The target version of SQL Server specified in the configuration file|
|%output_name%||A synthesized output file name consisting of the diagnostic group and task names, the target machine and SQL Server (as appropriate) and a string indicating when the diagnostic was collected|
|%internal_output_name%||A synthesized internal (## prefix) output file name consisting of the diagnostic group and task names, the target machine and SQL Server (as appropriate), and a string indicating when the diagnostic was collected|
|%ssbatchrunner%||The name of the executable used to run T-SQL scripts (OSQL.exe or SQLCMD.exe, depending on the version of SQL Server)|
|%quietmode%||A 1 or 0 indicating whether the quiet mode switch (/Q) was passed in the collector's startup parameters|
|%forceremotemode%||A 1 or 0 indicating whether a stored procedure is being used instead of SQLDIAG.exe to collect SQLDIAG reports|
|%compressionmode%||An integer indicating whether no compression, NTFS compression, or Zip compression has been selected|
|%genericmode%||A 1 or 0 indicating whether the Generic Mode switch (/G) was passed in the collector's startup parameters|
|%casenumber%||A string indicating the Product Support Services case number the collection is associated with|
|%starttime%||A string indicating the collection start date/time if one was specified|
|%endtime%||A string indicating the collection end date/time if one was specified|
|%comspec%||The full path to the Windows command interpreter (either cmd.exe or command.com, depending on the version)|
|%cfgfile%||The full path of the configuration file|
|%outputfoldermode%||An integer indicating whether the collector service should clear, overwrite, or rename an existing Output folder when it starts|
You can use any of these in the custom tasks you create as well as any environment variables on the system. The syntax for either is the same: simply delimit the variable name with percent (%) signs, as demonstrated above.
Custom Meta Variables
You can also define custom meta variables. You specify these as name-value pairs, and delimit them from the main cmd text (and from each other) with semicolons. Here's an example:
<Collector type="customdiagnostic" point="startup" groupname="MergeReplication" taskname="GeneralMergeAnalysis" customdiagtype="tsql_script" wait="yes" cmd="`merge_analysis.TEM; %pubdb%=Northwind `" />
In this example, we're configuring a custom T-SQL script template, merge_analysis_.te. to be executed. As mentioned in Table 2, a script with an extension of .tem is treated as a template, and the collector will automatically translate any embedded meta variables it contains (including system environment variables) before running it. This process results in the file being rewritten as a .SQL script prior to execution. In the example above, a single custom variable, %pubdb%, is defined with a value of Northwind. This will cause the collector to replace every occurrence of %pubdb% in the script template with Northwind prior to executing it.
You set up multiple custom meta variables by separating them with semicolons:
<Collector type="customdiagnostic" point="startup" groupname="MergeReplication" taskname="GeneralMergeAnalysis" customdiagtype="tsql_script" wait="yes" cmd="`merge_analysis.TEM; %pubdb%=Northwind; %distdb%=distribution `" />
Here, we define two custom variables: %pubdb% and %distdb%. Each time either occurs in the template script, it will be replaced with the appropriate value.
Meta variables defined by one custom diagnostic task can be reused in subsequent tasks. So, for example, if you define %pubdb% in custom task #1, you can then reuse it (without assigning it a value) in custom task #2. This means, for example, that a script template task can reference custom variables defined by previous tasks as though they had been assigned a value in the custom task definition itself.
You can also redefine custom variables. Subsequent custom tasks will use the most recently assigned value.
Prompting for Variables
Rather than assigning a fixed value to a meta variable, you can instruct the collector console application (but not the service) to prompt for a value for it at runtime. In the interest of making a diagnostic configuration more generic and reusable, there will be times that you'll want to defer assigning a value to a variable until the collector actually runs. You set this up via special tokens in the meta variable definition within the custom diagnostic:
<Collector type="customdiagnostic" point="startup" groupname="MergeReplication" taskname="GeneralMergeAnalysis" customdiagtype="tsql_script" wait="yes" cmd="`merge_analysis.TEM; %pubdb%=?!"pubdb","Please supply the name of the published database","string",""!? `" />
The prompting details are enclosed in ?! and !? tokens. These mark the start and end of fields defining the prompt. The prompting assignment consists of four subfields, and these are CSV-delimited. The first of these fields is the name of the variable. This is not used by the collector (it's used by other internal Microsoft tools), but must match the name used on the left side of the meta variable assignment statement. The second field is the prompt the collector is to display when prompting for the variable. The third field is the variable's data type. This is currently unused—all variables are treated as strings—but this may change in the future. The last field is the default value for the field. The collector will display it in parentheses according to when it prompts for the variable, and will be assigned to the variable if you press Enter without typing a new value when prompted.
For obvious reasons, the collector's ability to prompt for meta variables is only available when the collector is running as a console application. Services cannot interact easily with the Windows desktop, so there is no non-trivial way for the collector to prompt for input when it runs as a service. In fact, when running as a service, the collector will behave as though the quiet mode switch (/Q) has been specified and skip any input prompts it would otherwise display. This means that it will assign the default value for meta variable prompts that you've set up in the configuration info.
You can also define variables that have multiple values. You do this by setting up a child process to run and assigning its output (up to 2GB worth) to a variable. The variable can then be used with subsequent custom diagnostics to cause them to run once for each line of output stored in the variable.
This functionality is similar to the Windows for command. It provides the same functionality as this command without requiring you to write batch file code in order to get it.
Similar to prompting for a variable, assigning it using the output of another process is done using special tokens and sub-fields within the meta variable assignment. Here's an example:
<Collector type="customdiagnostic" point="startup" groupname="GeneralUtilities" taskname="ListFiles" customdiagtype="utility" wait="yes" cmd="`echo %file% >>"%output_name%.TXT"; %file%=!!dir /b *.txt!! `" />
The special delimiting token for the multi-value assignment is !!. In this example, we run an operating system command, dir /b *.txt, and assign the output it returns to the %file% variable. This particular command simply lists the file names matching the specified mask without any other directory header or detail info. These file names are then assigned to the %file% custom meta variable. When the collector then runs the specified echo %file% command for the custom diagnostic, it runs it once for each file name in the list, replacing %file% with each successive name from the list with each iteration.
This provides similar functionality to the operating system for command, and, in this particular example, doesn't provide functionality that couldn't also be realized through more straightforward mechanisms. However, having this functionality built into the collector allows the end user to avoid having to become intimately familiar with the nuances of lesser-used batch file constructs (such as for) and allows you to assign multi-value meta variables in the same way that you assign normal meta variables.
Adding Custom Diagnostics to DiagConfig
Given that DiagConfig has no direct support for adding these to the collector's configuration info, you may be wondering how you would configure the collector to gather a custom diagnostic. This is actually pretty simple. On startup, DiagConfig checks the system registry for collector configuration info, and loads it if present. If no configuration information is present in the registry, it loads an XML document (_Default.xml in the Pristine subfolder) containing a default configuration. By modifying this document, you can add custom diagnostics that the GUI will write to the registry when it starts the collector service. (To remove configuration information that has already been written to the registry, simply deregister the collector service, as I mentioned previously.)
To add a custom diagnostic to this document, add a Collector node like the one shown above to the ...\Instance\Collectors section of _Default.xml. Although you won't see the custom diagnostic represented in the GUI, it is there and will be preserved when the configuration information is later transferred to the registry on the target machine. When the collector starts, it will read your custom diagnostic along with the rest of the configuration information and will collect it for you.
Properly used, PSSDiag presents a diagnostic collection framework that is solid, powerful, intuitive, and extensible. In can significantly reduce the time needed to solve complex SQL Server- and non-SQL Server-related issues and can be used as an effective long-term monitoring solution for virtually any type of machine.
SQL Server for Developers
Ken Henderson is a husband and father living in suburban Dallas, Texas. He is the author of eight books on a variety of technology-related topics, including the recently released The Guru's Guide to SQL Server Architecture and Internals (Addison-Wesley, 2003). An avid Dallas Mavericks fan, Ken spends his spare time watching his kids grow up, playing sports, and gardening.