Troubleshooting and Analysis with Traces

The goal of SQL Trace is to help you—presumably, a DBA or database developer—analyze your system in order to either detect or solve problems. As SQL Server is such a large and complex product, using SQL Trace without knowing exactly what you want to do can sometimes feel like trying to find the proverbial needle in a very large haystack. In this section, we’ll try to distill some of the lessons we’ve learned from doing countless traces into a solid framework upon which you can build your own tracing skill. I’ll start with an overview of those event classes that we use most often, after which we’ll drill into some common scenarios for which SQL Trace can helps in your day-to-day work.

Commonly Used SQL Trace Event Classes

With over 170 events to choose from, the Profiler Events Selection dialog can be a bit daunting to new users. Luckily for your sanity, most of these events are not commonly used in day-today tracing activity, and are provided more for the sake of automated tools. In this section, we will describe the most common events that we use in our work.

  • Errors and Warnings:Attention

    This event is usually fired when a client disconnects from SQL Server unexpectedly. The most common cause of this is a client library time-out, which is generally a 30-second timer that starts the moment a query is submitted. If queries are timing out, it’s something you want to know about immediately, so this event is used frequently.

  • Errors and Warnings:Exception and Errors and Warnings:User Error Message

    Exceptions and user error messages go hand-in-hand, and we always trace these two classes together. When a user exception occurs, both events will be fired. The Exception event will include the error number, severity, and state, whereas the User Error Message event will include the actual text of the error. We discuss these events in detail in the section “Identifying Exceptions” later in this chapter.

  • Locks:Deadlock Graph and Locks:Lock:Deadlock Chain

    In previous versions of SQL Server, deadlocks could only be identified through the slightly obscure Deadlock Chain event. SQL Server 2005 introduces the much more usable Deadlock Graph event, which produces standard XML that Profiler is able to render into a very nice graphical output. We’ll discuss this output in the “Debugging Deadlocks” section later in this chapter.

  • Locks:Lock:Acquired, Locks:Lock:Released, and Locks:Lock:Escalation

    We use these events mainly in conjunction with working on deadlocks. They provide insight into what locks SQL Server takes during the course of a transaction, and for how long they are held. . These events can be also very interesting to monitor if you’re curious about how SQL Server’s various isolation levels behave. Make sure, when using these events, to filter on a specific spid that you’re targeting, lest you get back far too much information to process.

  • Performance:Showplan XML Statistics Profile

    This event can be used to capture XML showplan output for queries that have run on the server you’re profiling. There are actually several different showplan and XML showplan event classes, but this one is the most useful, in our opinion, as it includes actual rowcounts and other statistics data, which can help when tuning queries. This event class will be discussed in more detail in the “Performance Tuning” section.

  • Security Audit (Event Category)

    Although this isn’t an event class—but rather, a category that includes many event classes—we thought it belonged in this list because it contains a number of useful event classes to help you monitor virtually all security-related activity as it occurs on your server. This includes such information as failed logins attempts (Audit Login Failed event class), access to specific tables or other objects (Audit Schema Object Access Event event class), and even when the server is restarted (Audit Server Starts and Stops event class). Most of these event classes are designed for SQL Server’s built-in server auditing traces, described in the “Auditing: SQL Server’s Built-in Traces” section at the end of this chapter.

  • Security Audit:Audit Login and Security Audit:Audit Logout

    We specifically singled these two events out of the overall Security Audit category because they are the two events in that category that we find useful on a day-to-day basis, especially when doing performance tuning. By monitoring these events along with various query events in the Stored Procedures and TSQL category, you can more easily aggregate based on a single session.

    Tip Thanks to a change in SQL Server 2005 SP2, these events now fire even for pooled logins and logouts, making them even more useful than before. To detect whether or not the event fired based on a pooled connection, look for a value of 2 in the EventSubClass column.

  • Stored Procedures:RPC:Starting and Stored Procedures:RPC:Completed

    These events fire when a remote procedure call (RPC; generally, a parameterized query or stored procedure call, depending on the connection library you’re using) is executed by a client application.

  • TSQL:SQL:BatchStarting and TSQL:SQL:BatchCompleted

    These events fire when an ad hoc batch is executed by a client application. Using these in combination with the RPC event classes will allow you to capture all requests submitted to the server by external callers. Both the BatchCompleted event class and the corresponding RPC:Completed event class populate four key columns: Duration, Reads, Writes, and CPU. We discuss these in more detail in the “Performance Tuning” section later in this chapter.

  • Stored Procedures:SP:StmtStarting and Stored Procedures:SP:StmtCompleted

    Sometimes it can be difficult to determine which access path was taken in a complex stored procedure full of flow control statements. These events are fired every time a statement in a stored procedure is executed, giving you a full picture of what took place. Note that these events can produce an extremely large amount of data. It is, therefore, best to use them only when you’ve filtered the trace by either a given spid you’re tracking, or a certain stored procedure’s name or object ID (using the ObjectName or ObjectId columns, respectively) that you’re interested in.

  • Stored Procedures:SP:Recompile

    Stored procedure recompiles are commonly cited as a potential SQL Server performance problem. SQL Server includes a counter to help track them (SQLServer:SQL Statis-tics:SQL Re-Compilations/Sec), and if you see a consistently high value for this counter you may consider profiling using this event class in order to determine which stored procedures are causing the problem.

    Note For more information on avoiding recompiles see Chapter 5, “Plan Caching and Recompilation.”

  • Stored Procedures:SP:Starting

    This event class fires whenever a stored procedure or function is called—no matter whether it is called directly by a client, or nested within another stored procedure or function. Since it does not populate the Reads, Writes, or CPU columns, this event class is not especially useful for performance tuning, but it does have its value. We use this class often for obtaining simple counts of the number of times a specific stored procedure was called within a given interval, and also for situations in which stored procedure calls are heavily nested and we need to determine exactly in which sequence calls were made that resulted in a certain procedure getting executed.

  • Transactions:SQLTransaction

    This event can be used to monitor transaction starts, commits, and rollbacks. You can determine which state the transaction is in by looking at the EventSubClass column, which will have a value of 0, 1, or 2 for a transaction starting, committing, or rolling back, respectively. Note that because every data modification uses a transaction, this event can cause a huge amount of data to be returned on a busy server. If possible, make sure to filter your trace based on a specific spid that you’re tracking.

  • User Configurable (Event Category)

    This event category contains 10 events, named UserConfigurable:0 through UserConfig-urable:9. These events can be fired by users or modules with sufficient ALTER TRACE access rights, allowing custom data to be traced. We discuss some possibilities in the “Stored Procedure Debugging” section of this chapter.

Performance Tuning

Performance is always a hot topic, and for a good reason; in today’s competitive business landscape, if your users feel that your application is too slow they’ll simply move on to a different provider. In order to help you prevent this from happening, SQL Trace comes loaded with several event classes that you can harness in order to find and debug performance bottlenecks.

Note Performance monitoring is an art unto itself, and a complete methodology is outside the scope of this chapter; in this chapter, we’ll stick with only what you can do using SQL Trace. Different aspects of troubleshooting for performance are discussed throughout this book. In addition, for a comprehensive discussion on the topic of how to think about SQL Server performance, please refer to Inside SQL Server 2005: T-SQL Querying (Microsoft Press, 2006).

Performance monitoring techniques can be roughly grouped into two categories: those you use when you already know something about the problem, and those you use to find out what the problem is (or find out if there even is a problem). These categories tend to dovetail nicely; once you’ve found out something about the problem, you can start heading in the direction of getting more information. Therefore, let’s start with the second technique, helping to pinpoint problem areas, and then move into how to conduct a more detailed analysis.

When walking into a new database performance tuning project, the very first thing we want to find out is what queries make up the “low-hanging fruit.” In other words, we want to identify the worst performance offenders, as these are the ones that can give us the biggest tuning gains. It’s important at this stage to not trace too much information, so we generally start with only the Stored Procedures:RPC:Completed and TSQL:SQL:BatchCompleted events. These events are both selected in the TSQL_Duration template that ships with SQL Server Profiler. We recommend adding the Reads, Writes, and CPU columns for both events, which are not selected by default in the template, in order to get a more complete picture. We also recommend selecting the TextData column rather than the (default) BinaryData column for the RPC:Completed event—this will make it easier to work with the data later. A properly specified set of events is shown in Figure 2-13.

Cc293616.figure_C02621961_13(en-us,TechNet.10).jpg

Figure 2-13 Selected events/columns for an initial performance audit

Once you’ve selected the events, set a filter on the Duration column for a small number of milliseconds. Most of the active OLTP systems we’ve worked with have an extremely large number of zero-millisecond queries, and these are clearly not the worst offenders in terms of easy-to-fix performance bottlenecks. We generally start with the filter set to 100 milliseconds, and work our way up from there. The idea is to increase the signal-to-noise ratio on each iteration, eliminating “smaller” queries and keeping only those that have a high potential for performance tuning. Depending on the application, we generally run each iterative trace for 10 to 15 minutes, depending on server load, then take a look at the results and increase the number appropriately until we net only a few hundred events over the course of the trace. Note that the 10- to 15-minute figure may be too high for some extremely busy applications. See the “Traceing Considerations and Design” section later in this chapter for more information.

The other option is to run only the initial trace, and then filter the results down from there. A simple way to handle that is with SQL Server 2005’s NTILE windowing function, which divides the input rows into an equal number of “buckets.” To see only the top 10 percent of queries in a trace table, based on duration, use the following query:

SELECT * 
FROM 
(
SELECT
*,
NTILE(10) OVER(ORDER BY Duration) Bucket 
FROM TraceTable
) x
WHERE Bucket = 10

Note The execution by an application of an extremely large number of seemingly small—even zero-millisecond—queries can also be a performance problem, but it generally has to be solved architecturally, by removing chatty interfaces, rather than via Transact-SQL query tuning. Profiling to find these kinds of issues can also be incredibly difficult without knowledge of exactly what a particular application is doing, so we will not cover it here.

If you find that it is difficult to get the number of events returned to a manageable level—a common problem with very busy systems—you may have to do some tweaking of the results to group the output a bit better. The results you get back from SQL Trace will include raw text data for each query, including whichever actual arguments were used. To analyze the results further, the data should be loaded into a table in a database, then aggregated, for example, to find average duration or number of logical reads.

The problem is doing this aggregation successfully on the raw text data returned by the SQL Trace results. The actual arguments are good to know—they’re useful for reproducing performance issues—but when trying to figure out which queries should be tackled first we find that it’s better to aggregate the results by query “form.” For example, the following two queries are of the same form—they use the same table and columns, and only differ in the argument used for the WHERE clause—but because their text is different it would be impossible to group them as-is:

SELECT * 
FROM SomeTable 
WHERE SomeColumn = 1 
---
SELECT * 
FROM SomeTable 
WHERE SomeColumn = 2
To help solve this problem, and reduce these queries to a common form that can be grouped, Itzik Ben-Gan provided a CLR UDF in <msl_i>Inside SQL Server 2005: T-SQL Querying</msl_i>, a slightly modified version of which—that also handles NULLs—follows:
[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true)] 
public static SqlString sqlsig(SqlString querystring) 
{ 
    return (SqlString)Regex.Replace( 
        querystring.Value,
@”([\s,(=<>!](?![^\]]+[\]]))(?:(?:(?:(?:(?#   expression coming 
)(?:([N])?(')(?:[^']|'')*('))(?#    character
)|(?:0x[\da-fA-F]*)(?#  binary
)|(?:[-+]?(?:(?:[\d]*\.[\d]*|[\d]+)(?#  precise number
)(?:[eE]?[\d]*)))(?#    imprecise number
)|(?:[~]?[-+]?(?:[\d]+))(?# integer
)|(?:[nN][uU][lL][lL])(?#   null
))(?:[\s]?[\+\-\*\/\%\&\|\^][\s]?)?)+(?#    operator
)))”, 
@”$1$2$3#$4”);
}

This UDF finds most values that “look” like arguments, replacing them with a “#”. After processing both of the preceding queries with the UDF, the output would be the same:

SELECT * 
FROM SomeTable 
WHERE SomeColumn = #

To use this UDF to help with processing a trace table to find the top queries, you might start with something along the lines of the following query, which groups each common query form and finds average values for Duration, Reads, Writes, and CPU:

SELECT
QueryForm, 
AVG(Duration), 
AVG(Reads), 
AVG(Writes), 
AVG(CPU)
FROM 
(
SELECT 
    dbo.fn_sqlsig(TextData) AS QueryForm,
1. * Duration AS Duration, 
1. * Reads AS Reads,  
1. * Writes AS Writes, 
1. * CPU AS CPU
 FROM TraceTable
 WHERE TextData IS NOT NULL 
 ) x 
 GROUP BY QueryForm

From here you can further filter by the average values in order to find those queries to which you’d like to dedicate a bit more attention.

Once you have decided upon one or more queries to tune, you can use SQL Trace to help with further analysis. For example, suppose that you had isolated the following stored procedure, which can be created in the AdventureWorks database, as a culprit:

CREATE PROCEDURE GetManagersAndEmployees
@EmployeeID INT AS
BEGIN
SET NOCOUNT ON
EXEC uspGetEmployeeManagers @EmployeeID 
EXEC uspGetManagerEmployees @EmployeeID
END

To begin a session to analyze what this stored procedure is doing, first open a new query window in SQL Server Management Studio, and get the spid of your session using the @@SPID function. Next, open SQL Server Profiler, connect to your server, and select the Tuning template. This template adds SP:StmtCompleted to the combination of events used to get a more general picture of server activity. This will result in a lot more data returned per call, so use the spid you collected before to filter your trace. You also might wish to add the Showplan XML Statistics Profile event, in order to pull back query plans along with the rest of the information about your query. Figure 2-14 shows a completed Events Selection screen for this kind of work.

Note Adding a Showplan XML or Deadlock Graph event causes an additional tab to open in the Trace Properties dialog, called Events Extraction Settings. This tab contains options for automatically saving any collected query plans or deadlock graph XML to text files, in case you need to reuse them later.

Next, go ahead and start the trace in SQL Server Profiler. Although we generally use server-side traces for most performance monitoring, the amount of overhead that Profiler brings to the table when working with a single spid to process a single query is small enough that we don’t mind harnessing the power of the UI for this kind of work. Figure 2-15 shows the Profiler output on our end after starting the trace and running the query for @EmployeeID=21. We have selected one of the Showplan XML events in order to highlight the power of this feature; along with each statement executed by the outermost stored procedure and any stored procedures it calls, you can see a full graphical query plan, all in the Profiler UI. This makes it an ideal assistant for helping you tune complex, multilayered stored procedures.

SQL Trace will not actually tune for you, but it will help you to find out not only which queries are likely to be causing problems, but also which components of those queries need work.

However, performance tuning is far from the only thing that it can be used for. In the next section, we’ll explore another problem area—exceptions—and how to use SQL Trace to help in tracking them down.

Cc293616.figure_C02621961_14(en-us,TechNet.10).jpg

Figure 2-14 Trace set up for performance profiling of a single spid’s workload

Cc293616.figure_C02621961_15(en-us,TechNet.10).jpg

Figure 2-15 Using Profiler to trace batches, statements, and query plans

Note Saying that SQL Trace will not actually tune for you is only mostly correct. SQL Server’s Database Engine Tuning Advisor (DTA) tool can take trace files as an input, in order to help recommend indexes, statistics, and partitions to make your queries run more quickly. If you do use the DTA tool, make sure to feed it with a large enough sample of the queries your system generally handles. Too small a collection size will skew the results, potentially causing the DTA to make subpar recommendations, or even produce suggestions that cause performance problems for other queries that weren’t in the input set.

Identifying Exceptions

In a perfect world, all exceptions would be caught, handled, and logged. Appropriate personnel would regularly watch the logs and create bug reports based on the exceptions that occurred, such that they could be debugged in a timely manner and avoided in the future. But alas, the world is far from perfect, and it is quite common to see applications that regularly bubble exceptions all the way from the database to the user interface, with no logging of any kind. Or, worse, applications that catch and swallow exceptions, resulting in no one ever knowing that they occurred, yet occasionally resulting in strange data returned to users who aren’t quite sure what’s going on. In order to find out when either of these scenarios is happening, we often take a proactive approach and watch for exceptions so that we can find and fix them—hopefully before they frustrate too many users.

Tracing for exceptions is fairly simple; you can start with the TSQL template, which includes the Audit Login and Audit Logout events, as well as the ExistingConnection event—all of which can be removed for this exercise. Left are RPC:Starting and SQL:BatchStarting, both of which are needed in order to trace the Transact-SQL that caused an exception, whether it occurred as the result of an SQL batch or RPC call. It’s important to trace for the Starting rather than Completed events in this case, because certain errors can result in the Completed event not firing for a given query.

To the RPC and SQL events, add the Attention, Exception, and User Error Message events from the Errors and Warnings category. The Attention event fires whenever a client forcibly disconnects–the best example of which is a client-side query time-out. These are good to know about, and usually indicate performance or blocking issues. The Exception event fires whenever an exception of any kind occurs, whereas the User Error Message event fires either in conjunction with an exception—to send back additional data about what occurred, in the form of a message—or whenever a variety of statuses change, such as a user switching from one database to another.

We also recommend adding the EventSequence column for each selected event class. This will make querying the data later much easier. Figure 2-16 shows a completed Events Selection dialog box for the events and columns that we recommend for exception monitoring.

Note SQL Server uses exceptions internally to send itself messages during various phases of the query execution process. A telltale sign of this is an Exception event with no corresponding User Error Message event. Should you see such a situation, do not be alarmed; it’s not actually an error meant for you to consume.

Once you’ve selected the appropriate events, script the trace and start it. This is the kind of trace you may want to run in the background for a while, doing occasional collections. See the “Traceing Considerations and Design” section for more information. Generally, you may find it interesting to collect this data during fairly busy periods of activity, in order to find out what exceptions your users may be experiencing. Since this kind of tracing is more like casting a net and hoping to catch something than a hunt for something specific, timing is of the essence. You may or may not see an exception, but just because you don’t see one during one collection period doesn’t mean they aren’t happening; make sure to monitor often.

Cc293616.figure_C02621961_16(en-us,TechNet.10).jpg

Figure 2-16 Events Selection used for tracing to discover exceptions and disconnections

Once you’ve captured your data and transferred it into a table, reporting on which exceptions occurred is a matter of finding:

  • All Attention events (EventClass 16) and the Transact-SQL or RPC event (EventClass 13 and 10, respectively) on the same spid that preceded the disconnection.
  • All Exception events (EventClass 33) immediately followed by a User Error Message event (EventClass 162), and the Transact-SQL or RPC event on the same spid that preceded the exception.

All logic for following and preceding can be coded using the EventSequence column, which is why we recommend including it in this trace. The following query uses this logic to find all user exceptions and disconnections, related error messages where appropriate, and the queries that caused the problems:

;WITH Exceptions AS 
(
SELECT
T0.SPID,
T0.EventSequence,
COALESCE(T0.TextData, 'Attention') AS Exception, 
T1.TextData AS MessageText
FROM TraceTable T0
LEFT OUTER JOIN TraceTable T1 ON
T1.EventSequence = T0.EventSequence + 1 
AND T1.EventClass = 162

WHERE
T0.EventClass IN (16, 33)
AND (T0.EventClass = 16 OR T1.EventSequence IS NOT NULL) 
) 
SELECT * 
FROM Exceptions 
CROSS APPLY
(
SELECT TOP(1)
TextData AS QueryText 
FROM TraceTable Queries 
WHERE
Queries.SPID = Exceptions.SPID
AND Queries.EventSequence < Exceptions.EventSequence 
AND Queries.EventClass IN (10, 13)
ORDER BY EventSequence DESC 
) p

If you’ve collected a large number of events, you can greatly improve the performance of this query by creating a clustered index on the trace table, on the EventSequence column.

Tip If you’re aware of an exception condition that occurs when a certain stored procedure is called, but you need more information on what sequence of events causes it to fire, you might work with the same events detailed in the preceding section, for performance tuning of a single query. Switch the query events shown in that section from Completed to the corresponding Starting classes, and add the Exception and User Error Message events. Much like with the tuning of a single query example, this is something you should run directly in SQL Server Profiler, with a filter on the spid from which you’re working.

Debugging Deadlocks

Tracking general exceptions is a good thing, but every DBA knows the horrors of dealing with a certain type exception: message number 1205, severity 13, familiarly known as deadlocks. Deadlock conditions are difficult to deal with because it often feels like you just don’t get enough data from the server to help you figure out exactly what happened and why. Even the error message returned by the server is somewhat hopeless; the only suggestion made in the message is that you might “rerun the transaction.”

SQL Trace has long exposed tools to help with isolating and debugging deadlock conditions, but SQL Server 2005 takes these to the next level, providing a very useful graphical interface to help you resolve these nasty issues. To illustrate what is available, we’ll show you how to force a deadlock in the tempdb database. Start with the following code:

USE tempdb 
GO
CREATE TABLE Deadlock_Table 
(
ColumnA int NOT NULL PRIMARY KEY 
)

GO
INSERT Deadlock_Table 
SELECT 1 UNION ALL SELECT 2 
GO

By starting two separate transactions and staggering updates to the rows in the opposite order, we can make a deadlock occur and observe how SQL Trace can help with debugging.

Note The following example assumes that you have already identified the two stored procedures or queries involved in causing the deadlock, either based on an exception trace as shown in the preceding section, or by enabling trace flag 1222 at the server level (add –T1222 to the startup parameters for the SQL Server service) and collecting the deadlocking resources from the SQL Server error log. Once you have identified the participating queries, conduct the actual research on a development instance of SQL Server on which you’ve restored the production data. Debugging deadlocks can require collecting a lot of data, and because some of the events are fired by system spids it can be impossible to filter the trace so that you only collect relevant data. On a busy system this trace will create a very large amount of load, so we recommend always working offline.

To begin with, open two query windows in SQL Server Management Studio, and collect their spids using @@SPID. You may want to use these spids later to help with analysis of the collected trace data. Start a new SQL Server Profiler session and use the TSQL template to select the RPC:Starting and SQL:BatchStarting events. To these, add the Deadlock graph, Lock:Acquired, Lock:Escalation, and Lock:Released events, all from the Locks category. The Lock events will help you analyze the sequence of locks taken to contribute to the deadlock condition, and the Deadlock graph event will help with a graphical display of what went wrong.

You might optionally consider adding the SP:StmtStarting event, in case one or more of the stored procedures you’re debugging are running numerous statements, any of which might be contributing to the deadlock. You should also add the EventSequence column, in order to make it easier to analyze the data after collection. Figure 2-17 shows the completed Events Selection dialog box for this activity.

Tip In this example, we will show you how to run the statements in exactly the correct order and at the correct times to force a deadlock to occur while you watch the trace in SQL Server Profiler. However, in many real-world situations it’s not quite that easy; many deadlocks depend upon exact timing (or mistiming) and to reproduce them you’ll have to run each query in a loop, hoping that they eventually collide in just the right way. SQL Server Management Studio also has a feature to help you run a query in a loop as many times as you’d like. In each query window, set up the batch for your queries, then follow the batch with GO and the number of times you’d like the query to run. For example, the following Transact-SQL code will execute the MyStoredProcedure stored procedure 1,000 times:

EXEC MyStoredProcedure

GO 1000

Cc293616.figure_C02621961_17(en-us,TechNet.10).jpg

Figure 2-17 Completed Events Selection dialog box for debugging deadlocking queries

Once your events are set up, start the trace in Profiler. Note that no filters should be used in this case, as the Deadlock graph event may be fired by any number of system spids. As a result of the lack of filters, you’ll see some system lock activity. This can be ignored or filtered out after the trace has been completed. Another strategy is covered later in this chapter, in the “Reducing Trace Overhead” section.

Once the trace is started, return to the first query window (spid 52 in our test), and run the following batch:

BEGIN TRANSACTION 
UPDATE Deadlock_Table 
SET ColumnA = 3 
WHERE ColumnA = 1 
GO

Next, run the following batch in the second query window (spid 53 in our test):

BEGIN TRANSACTION 
UPDATE Deadlock_Table 
SET ColumnA = 4 
WHERE ColumnA = 2 
GO

Both of these queries should return, since their locks are compatible; they’re each taking locks on different rows of the Deadlock_Table table. Back in the first query window, start the following update, which will begin waiting on the second window’s session to release its lock:

UPDATE Deadlock_Table 
SET ColumnA = 4 
WHERE ColumnA = 2 
GO

Finally, return to the second window and run the following update, which will start waiting on the first window’s session to release its lock. Since both sessions will now be waiting for each other to release resources, a deadlock will occur:

UPDATE Deadlock_Table 
SET ColumnA = 3 
WHERE ColumnA = 1 
GO

After the deadlock has occurred the trace can be stopped. Find the Deadlock graph event produced, which should look something like the one shown in Figure 2-18. The Deadlock graph event includes a huge amount of data to help you debug what occurred. Included are the object ID, the index name (if appropriate), and the HoBt (Hash or B-Tree) ID, which can be used to filter the locking resource even further, by using the hobt_id column of the sys.partitions view. In addition, you can determine the actual queries that were involved in the deadlock by scrolling back and finding the last query event run by each spid prior to the deadlock occurring.

Cc293616.figure_C02621961_18(en-us,TechNet.10).jpg

Figure 2-18 Viewing the Deadlock graph event in SQL Server Profiler

Should you need more data to debug further, you also have a wealth of lock information available. You might notice that in the screenshot, none of the lock events adjacent to the Deadlock graph event have anything to do with the spids with which we were working. The system acquires and releases quite a few locks even while at rest, so to look at the lock chain in more detail you’ll want to load the data into a trace table and make use of the EventSequence column to rebuild exactly what happened, in the correct order.

As with performance analysis, SQL Trace will not actually resolve the deadlock condition for you, but it will provide you with ample data to help you determine its cause and get closer to a resolution. In the next section, we’ll explore other debugging tricks with SQL Trace.

Stored Procedure Debugging

SQL Server 2005 includes many different tools to help you debug complex stored procedures. These include a Transact-SQL debugger available in Visual Studio, the ability to embed print statements or otherwise return debug statuses from your stored procedures, and the ability to raise custom errors from within your stored procedures in order to log status information. SQL Trace also supplies an underpublicized tool that can be helpful in this regard, called user-configurable events.

A user-configurable event is nothing more than a call to a system stored procedure called sp_trace_generateevent. This stored procedure accepts three parameters:

  • @eventid is an integer value between 82 and 91. Each value corresponds to one of the 10 user configurable event classes, numbered from 0 through 9; a value of 82 will raise a UserConfigurable:0 event, 83 will raise a UserConfigurable:1 event, etc.
  • @userinfo is an nvarchar(128) value that will be used to populate the TextData column for the event.
  • @userdata is a varbinary(8000) value that will be used to populate the BinaryData column for the event.

In a few especially tricky situations, we’ve had to deal with stored procedures that only occasionally failed, under circumstances that were difficult to replicate in a test environment. Long-term tracing in these situations can be difficult, as to really debug the situation might require statement-level collection, which will produce a lot of data if run for extended periods. A better option is to trace at the stored-procedure- and batch-level only, and use user configurable events to collect variable values and other data to help you debug the problem when it does actually occur. This way you can leave the system running and functional, without having to worry about your trace collecting too much data.

To set this up, you must first be aware that the sp_trace_generateevent stored procedure requires ALTER TRACE permission in order to be run. Since it’s unlikely that your stored procedures will have access to that permission, it is a good idea to create a wrapper stored procedure that calls sp_trace_generateevent, and which has the appropriate permission. To make this happen, we’ll have to employ SQL Server 2005’s module signing feature. The first step is to create a certificate in the master database:

USE master 
GO
CREATE CERTIFICATE ALTER_TRACE_CERT
ENCRYPTION BY PASSWORD='-UsE_a!sTr0Ng_PwD-or-3~' 
WITH
SUBJECT='Certificate for ALTER TRACE', 
START_DATE='20000101', 
EXPIRY_DATE='99990101'
GO

Next, a login is created based on the certificate. The login is granted both ALTER TRACE permission, as well as AUTHENTICATE SERVER permission, the latter of which gives it the right to propagate server-level permissions to database-level modules (such as the wrapper stored procedure):

CREATE LOGIN ALTER_TRACE_LOGIN 
FROM CERTIFICATE ALTER_TRACE_CERT 
GO
GRANT ALTER TRACE TO ALTER_TRACE_LOGIN 
GO
GRANT AUTHENTICATE SERVER TO ALTER_TRACE_LOGIN 
GO

Once that’s taken care of, back up the certificate, including the private key. The backup will be used to restore the same certificate in any user database in which you’d like to use the wrapper stored procedure.

BACKUP CERTIFICATE ALTER_TRACE_CERT 
TO FILE='C:\ALTER_TRACE.cer' 
WITH PRIVATE KEY
(
FILE='C:\ALTER_TRACE.pvk',
ENCRYPTION BY PASSWORD='-UsE_a!sTr0Ng_PwD-or-3~', 
DECRYPTION BY PASSWORD='-UsE_a!sTr0Ng_PwD-or-3~'
) 
GO

For the sake of this chapter’s sample code, we’ll show you how to build the wrapper procedure in tempdb, but in reality you could do this in any user database. The following code creates a certificate in tempdb from the backed-up version, then creates a simple wrapper over the sp_trace_generateevent stored procedure:

USE tempdb 
GO
CREATE CERTIFICATE ALTER_TRACE_CERT 
FROM FILE='C:\ALTER_TRACE.cer' 
WITH PRIVATE KEY
(
FILE='C:\ALTER_TRACE.pvk',
ENCRYPTION BY PASSWORD='-UsE_a!sTr0Ng_PwD-or-3~', 
DECRYPTION BY PASSWORD='-UsE_a!sTr0Ng_PwD-or-3~'
) 
GO
CREATE PROCEDURE ThrowEvent
@eventid INT,
@userinfo nvarchar(128), 
@userdata varbinary(8000)
AS 
BEGIN
EXEC sp_trace_generateevent
@eventid = @eventid, 
@userinfo = @userinfo,
@userdata = @userdata
END
GO

To complete the exercise, the stored procedure is signed with the certificate—which gives the procedure effectively all of the same permissions as the ALTER_TRACE_LOGIN login, and then permission is granted for any user to run the stored procedure:

ADD SIGNATURE TO ThrowEvent 
BY CERTIFICATE ALTER_TRACE_CERT
WITH PASSWORD='-UsE_a!sTr0Ng_PwD-or-3~' 
GO
GRANT EXEC ON ThrowEvent TO [public] 
GO

Once the ThrowEvent stored procedure is created in the database(s) of your choice, you can begin using it from within other stored procedures, and, thanks to the certificate, you can do so without regard to what permissions the caller has. This can be an invaluable tool when trying to figure out the context around intermittent problems.

For example, suppose that you find during testing that one of your stored procedures that should be updating a few rows in a certain table seems to not update anything from time to time. This problem seems to be caused by the state of yet another table at exactly the time the stored procedure is called, but you haven’t yet been able to reproduce things lining up in just that way.

To help debug this, you might insert the following code into your stored procedure, just after the update:

IF @@ROWCOUNT = 0
EXEC ThrowEvent 82, N'No data inserted into MyTable', 0x0000

At this point, you would set up a trace to capture RPC:Starting and SQL:BatchStarting events, as well as the UserDefined:0 event. After letting it play for a while, once the user-defined event fires letting you know that no rows were inserted, you may have collected enough back data to figure out what state the other table was in at the exact time of the insert.

This is somewhat of a contrived example, but it helps show the power of adding this tool to your stored procedure debugging toolkit. The more controllable visibility you have into what’s going on in your stored procedures, the easier it becomes to track down and fix small problems before they turn into major ones. In the next section, we’ll get into how to keep SQL Trace itself from causing you problems.

< Back      Next >

 

 

© Microsoft. All Rights Reserved.