Introducing SQL Server Extended Events

SQL Server Extended Events (Extended Events) is a general event-handling system for server systems. The Extended Events infrastructure supports the correlation of data from SQL Server, and under certain conditions, the correlation of data from the operating system and database applications. In the latter case, Extended Events output must be directed to Event Tracing for Windows (ETW) in order to correlate the event data with operating system or application event data.

All applications have execution points that are useful both inside and outside an application. Inside the application, asynchronous processing may be enqueued using information that is gathered during the initial execution of a task. Outside the application, execution points provide monitoring utilities with information about the behavioral and performance characteristics of the monitored application.

Extended Events supports using event data outside a process. This data is typically used by:

  • Tracing tools, such as SQL Trace and System Monitor.

  • Logging tools, such as the Windows event log or the SQL Server error log.

  • Users administering a product or developing applications on a product.

Extended Events Design and Features

Extended Events have the following key design aspects:

  • The Extended Events engine is event agnostic, which enables the engine to bind any event to any target because the engine is not constrained by event content. For more information about the Extended Events engine, see SQL Server Extended Events Engine.

  • Events are separated from event consumers, which are called targets in Extended Events. This means that any target can receive any event. In addition, any event that is raised can be automatically consumed by the target, which can log or provide additional event context. For more information, see SQL Server Extended Events Targets

  • Events are distinct from the action to take when an event fires. As a result, any action can be associated with any event.

  • Predicates can dynamically filter event firing, which adds to the flexibility of the Extended Events infrastructure. For more information, see SQL Server Extended Events Packages.

Extended Events can synchronously generate event data (and asynchronously process that data) which provides a flexible solution for event handling. In addition, Extended Events provides the following features:

  • A unified approach to handling events across the server system, while at the same time enabling users to isolate specific events for troubleshooting purposes.

  • Integration with, and support for existing ETW tools.

  • A fully configurable event handling mechanism based on Transact-SQL.

  • The ability to dynamically monitor active processes, while at the same time having minimal effect on those processes.

Using Extended Events with ETW

If you want to use Extended Events to correlate data from the operating system and database applications, we recommend that you first have a working knowledge of ETW. ETW can be used in conjunction with Extended Events or as an Extended Events event consumer. The following topics provide a starting point for obtaining background information about ETW:

Scenarios for Using Extended Events

You can use Extended Events for a wide range of monitoring and troubleshooting scenarios. The following scenarios illustrate a few of the situations where Extended Events can provide valuable data for resolving issues in such areas as:

  • Troubleshooting the cause of working set trimming.

  • Troubleshooting excessive CPU usage.

  • Troubleshooting deadlocks.

  • Correlating request activity with Windows ETW logs.

Troubleshooting the cause of working set trimming

You are experiencing severe performance issues with your production server, causing client applications to time out. These issues appear to be transient, and performance returns to normal after 10-15 minutes.

You examine the SQL Server errorlog and find the following error messages:

"A significant part of the server process has been paged out. This may result in performance degradation. Duration: 300 seconds. Current memory utilization 34%."

"Non-yielding IOCP Listener."

Note

IOCP stands for "IO Completion port." This port services user requests over the network. The message indicates that the completion port has been blocked for the last minute.

You suspect that that SQL Server is not responding quickly enough to memory pressure on the server. When you check the memory with Task Manager, there appears to be more than enough available memory on the server. You attempt to connect to the database from SQL Server Management Studio, but the connection attempt times out. You are able to connect to the server by issuing the SQLCMD - A command from the Windows command prompt. This opens a session on the dedicated administrator connection.

You decide to use Extended Events to obtain more information. You create an Extended Events session that:

  • Adds events for system memory signal and total server memory change.

  • Directs the output to ETW. This output will be written to a file that is created on a drive that is not used for either the page file or by SQL Server database files.

At the Windows command prompt you execute an instruction to enable a Windows kernel ETW trace with all memory events. You allow both traces to run for several minutes and then close the extended events session and the Windows Kernel trace.

You use tracerpt.exe to correlate both the Windows trace and SQL Server ETW trace. You look for an event indicating that the system memory low notification was set, but do not find it. You instead see a high number of page faults from all processes on the server. You examine the events just before the paging and find that the working sets of all processes were trimmed in response to a memory allocation request from a driver.

Troubleshooting excessive CPU usage

You are investigating excessive CPU usage on your production system. You use various Dynamic Management Views (DMVs) to determine if the CPU usage can be attributed to queries that were run on the system. This examination indicates that most of the queries are ad-hoc user queries. The information obtained from the DMVs is not enough to diagnose the problem.

You create an Extended Events session that:

  • Enables events for statement completion with predicates that specify CPU threshold.

  • Has an action to only collect the query plan when the event fires.

  • Writes any data that is collected to a file. This file is on a drive that does not contain any log or database files.

After starting the Extended Events session you examine the output and are able to determine that the cause of the CPU problem is a data type conversion between two commonly joined tables.

Troubleshooting deadlocks

You have been receiving reports from users indicating that certain applications are returning deadlock errors. To maximize the effectiveness of troubleshooting these problems, you decide to focus on the deadlocks that are hit most frequently. You create an Extended Events session that:

  • Configures deadlock event tracking for the session.

  • Specifies a target that aggregates based on an identifier for the deadlock.

You run the Extended Events session, and after additional deadlocks are reported you are able to obtain aggregated deadlock information, along with the complete XML deadlock graph for each source. Using this information you are able to pin point the most common deadlocks and start working on a solution.

Correlating request activity with Windows ETW logs

You are troubleshooting an application slowdown on your production server and are able to narrow down the cause to long disk-read times. You create an Extended Events session that:

  • Adds disk reads as a session event.

  • Sends the data that is collected to ETW.

After starting the Extended Events session you run a Windows ETW kernel trace. After 10 minutes you stop both sessions.

You use tracerpt.exe to merge the Windows trace and SQL ETW trace. From this merged trace you are able to correlate and track the read requests from SQL Server to the Windows kernel. This analysis indicates a long delay before the I/O request is returned back to SQL Server. Using this information you can conclusively determine that the I/O problem lies in the physical I/O path.