Troubleshooting Resource Governor

This topic provides troubleshooting guidance for conditions that may occur when using Resource Governor. This guidance is organized into the following categories:

  • Errors

  • Unexpected outcomes

  • Performance-related issues and errors

Resource Governor Errors

Resource Governor error messages cover all the actions related to configuring and using Resource Governor.

The following table provides examples of Resource Governor error messages and provides guidance on how to resolve the problem described in the error message.

Error number

Error message

Resolution

8645

A timeout occurred while waiting for memory resources to execute the query in resource pool 'myTestPool' (257). Rerun the query.

Either configure the timeout value to be higher or reduce the query load to the server.

8651

Could not perform the operation because the requested memory grant was not available in resource pool 'myTestPool' (257). Rerun the query, reduce the query load, or check Resource Governor configuration setting.

Rerun the query at later time. Reduce the query load on the server. Get the administrator to check Resource Governor configuration setting.

8657

Could not get the memory grant of 1024 KB because it exceeds the maximum configuration limit in workload group 'myTestGroup' (267) and resource pool 'myTestPool' (257). Contact the server administrator to increase the memory usage limit.

Rewrite the query to reduce memory consuming operations such as sort and hash join. Ask the system administrator to allow a higher memory usage limit.

An administrator can adjust one or both of the following parameters:

  • max_memory_percent on resource pools, which sets the maximum physical memory grant space for all queries.

  • request_max_memory_grant_percent on workload groups, which sets the per-query-limit.

An administrator can get the actual physical limit from the max_target_memory_kb column from sys.dm_exec_query_resource_semaphores.

The per-query-limit can be calculated by max_target_memory_kb * request_max_memory_grant_percent.

NoteNote
The administrator needs to make sure the required memory stated in the error message is less than the per-query-limit calculated above. However, it should be noted that increasing request_max_memory_grant_percent has a side effect of reducing concurrency of large queries. For example, users can expect to run three large queries with the default 25 percent setting, but only two large queries with a 40 percent setting.

10900

Failed to configure Resource Governor during startup. Check SQL Server error log for specific error messages or check the consistency of master database by running DBCC CHECKCATALOG('master').

Try running "DBCC CHECKCATALOG('master')".

10901

User does not have permission to alter the Resource Governor configuration.

Grant the permission that would allow modification of the Resource Governor configuration and try again.

10902

User-defined function 'dbo.rgclassifier_v1' does not exist in master database, or the user does not have permission to access it.

Create a classifier user-defined function (UDF) in master or grant the required permissions on the existing classifier UDF.

10903

The specified schema name 'dbo' for classifier user-defined function either does not exist, or the user does not have permission to use it.

Try another schema name or obtain the correct permissions for this schema.

10904

Resource Governor configuration failed. There are active sessions in workload groups being dropped or moved to different resource pools. Disconnect all active sessions in the affected workload groups and try again.

Disconnect all active sessions in the affected group(s) and try again.

NoteNote
This release of Resource Governor does not allow moving groups between pools when there are open sessions in them.

10905

Could not complete Resource Governor configuration because there is not enough memory. Reduce the server load or try the operation on a dedicated administrator connection.

Reduce the load on the server or try a configure operation on a Dedicated Administrator Connection.

10906

The object 'dbo'.'rgclassifier_v1' is not a valid Resource Governor classifier user-defined function. A valid classifier user-defined function must be schema-bound, return sysname, and have no parameters.

Provide a valid classifier UDF. A valid classifier UDF must:

  • Return sysname.

  • Have no parameters.

  • Be created with the SCHEMABINDING option.

10907

Attribute 'MIN_CPU_PERCENT' with value of 50 is greater than attribute 'MAX_CPU_PERCENT' with value of 40.

Provide a minimum value that is lesser than or equal to the maximum value.

10908

Attribute 'MAX_MEMORY_PERCENT' with a value of 40 is less than attribute 'MIN_MEMORY_PERCENT' with a value of 60.

Provide a maximum value that is greater than or equal to the minimum attribute value.

10909

The resource pool cannot be created. The maximum number of resource pools cannot exceed current limit of 20, including predefined resource pools.

Drop resource pools that are not needed.

10910

The operation could not be completed. The specified 'MIN_CPU_PERCENT' value, 25, causes the sum of minimums on all resource pools to exceed 100 percent. Reduce the value or modify other resource pools so that the sum is less than 100.

Reduce the value for MIN_CPU_PERCENT.

10911

Requested operation cannot be performed because the resource pool 'myTestPool2' does not exist.

Query the sys.resource_governor_resource_pools catalog view to see what resource pools are currently defined. Choose an existing pool or create a new pool.

10912

The operation could not be completed. Dropping predefined workload group is not allowed.

Choose a user-created workload group to drop.

10913

Users are not allowed to delete the workload group 'internal' in the 'internal' resource pool.

Create the workload group in a user-created pool or the default pool.

10914

The name of the workload group '#mygroup' cannot begin with # of ##.

Do not use # or ## when creating a group or pool.

10915

The operation could not be completed. Altering 'internal' workload group is not allowed.

Choose a user-created pool or group to alter.

Note   Changing the configuration of the default group or resource pool is allowed.

10916

Cannot drop resource pool 'myTestPool' because it contains workload group 'myTestGroup'. Drop or remove all workload groups using this resource pool before dropping it.

Drop or move out all the workload groups that use this pool, and then drop the pool.

10917

ALTER WORKLOAD GROUP failed. Either a 'WITH' or 'USING' clause must be specified.

Use either the 'WITH' or the 'USING' clause in the ALTER WORKLOAD GROUP statement.

10918

Cannot create resource pool 'myTestPool' because it already exists.

Choose a different resource pool name.

10919

An error occurred while reading the Resource Governor configuration from master database. Check the integrity of master database or contact the system administrator.

Try running "DBCC CHECKCATALOG('master')".

10920

Cannot drop user-defined function 'dbo.myclassifer'. It is being used as a Resource Governor classifier.

None.

10921

The 'default' workload group cannot be moved out of 'default' resource pool.

Not applicable.

10981

Resource Governor reconfiguration succeeded.

This message is written to the SQL Server event log.

10982

Failed to run Resource Governor classifier user-defined function. See previous errors in SQL Server error log from session ID 58 for details. Classifier elapsed time: 800 ms.

This message is written to the SQL Server error log.

Note   Prior messages in the SQL Server error log with the same server process identifier (SPID) may provide specific failure reasons. A long-running classifier may cause user login timeout. Check to see if the classifier elapsed time exceeds client login timeout.

10983

Resource Governor reconfiguration was canceled by user.

Not applicable.

10984

Resource Governor reconfiguration failed.

Not applicable.

Unexpected Outcomes

Unexpected outcomes describe situations where the various elements of Resource Governor are functioning, but the results are not what you expected. For example, session classification does not appear to work correctly, or there are issues associated with dropping or creating workload groups.

Session Classification

Sessions will go to the default workload group when the following conditions exist:

  • The classifier UDF does not exist or is not enabled.

  • The classifier UDF put them there, which indicates a flaw in the function logic.

Basic Troubleshooting

If there is no classifier UDF available for classification then all sessions will automatically go to the default workload group. After you create a classifier UDF, you need to verify that it is registered with Resource Governor and that the in-memory configuration is updated.

Creating, registering, and enabling a classifier UDF is a three step process:

  • First, you need to create the function.

    CREATE FUNCTION function_name() RETURNS <something> 
    WITH SCHEMABINDING
    
  • Second, you need to register the function with Resource Governor.

    ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION=schema_name.function_name)
    
  • Third, you need to update the Resource Governor in-memory configuration.

    ALTER RESOURCE GOVERNOR RECONFIGURE
    

The first thing that you should do when troubleshooting classification is verify that the function that you created is registered with Resource Governor and the configuration was updated. Use the following query to obtain the schema name (schema_name) and classifier function name (function_name) for the classifier UDF that Resource Governor is currently using.

USE master
SELECT 
      object_schema_name(classifier_function_id) AS [schema_name],
      object_name(classifier_function_id) AS [function_name]
FROM sys.dm_resource_governor_configuration

You can use the preceding approach to troubleshoot a scenario where you changed the classifier UDF but Resource Governor is still using the previous function logic to classify sessions. This behavior indicates that the changes that you made were not applied to the in-memory configuration.

Advanced Troubleshooting

You can create a very complex classifier function that does not produce the expected results or is resource intensive. If you have done the basic troubleshooting, you then need to verify that the function logic is sound. A worst-case scenario is one in which a coding flaw results in an infinite loop or a runaway query.

You can use a Dedicated Administrator Connection (DAC) to troubleshoot a poorly written classifier function because a DAC is not subject to classification and can be used while Resource Governor is running and classifying incoming sessions. For more information, see Using a Dedicated Administrator Connection.

Note

If a DAC is not available for troubleshooting, you can restart the system in single user mode. Although single user mode is not subject to classification, you cannot diagnose Resource Governor classification while it is running.

You can obtain information about the classifier function by querying the following:

  • sys.dm_exec_query_stats. (contains statement information but not the actual function)

  • sys.dm_exec_sql_text (use in conjunction with the sql_handle obtained from sys.dm_exec_query_stats)

  • PreConnect:Starting Event Class (provides ID and name of classifier function)

Reconfigure Fails

Resource Governor keeps metadata changes separate from running sessions until the ALTER RESOURCE GOVERNOR RECONFIGURE statement completes. ALTER RESOURCE GOVERNOR RECONFIGURE will fail if you try to drop a group that contains active or open sessions, or if you try to drop a resource pool that contains workload groups.

To obtain the in-memory and stored configuration, query sys.dm_resource_governor_configuration and sys.resource_governor_configuration respectively. A value of 1 for is_reconfiguration_pending (sys.dm_resource_governor_configuration) indicates that the session configuration has not been updated. If this is the case, your options are as follows:

  • Wait for the sessions to finish or drop their connection.

  • Explicitly stop the active session or drop the session connection.

  • Recreate the group or pool that you dropped, adjust its settings, and rerun ALTER RESOURCE GOVERNOR RECONFIGURE.

If there appears to be performance issues when you use Resource Governor, you need to determine whether the issue is caused by the Resource Governor configuration. The troubleshooting guidance provided in this section is grouped into two categories:

  • Session classification

  • Query execution

Session Classification

A long-running logon trigger or classifier user-defined function (UDF) can have an impact on server performance. If the logon trigger or classifier UDF takes a long time to complete, the connection times out. However, the trigger or function continues to run and use server resources.

If you suspect that there are sessions running in a pre-connected state, log on using the dedicated administrator connection and check the PreConnect:Starting Event Class to see if there are multiple requests or sessions that have started but not completed.

To resolve this issue and prevent it from happening again:

  • Stop the sessions

  • Identify the probable causes for a long running function or logon trigger

  • Remove and replace the trigger or function that is causing the problem

Query Execution

After a query is classified and executing, it may appear to stop responding (hang) or fail. You suspect that the current Resource Governor settings may be the cause. You need to investigate the following aspects of your Resource Governor configuration:

  • Request count throttling

  • Maximum CPU limit

  • CPU bandwidth throttling

  • Memory grant size

  • Memory grant timeout error

  • Out of memory error

  • Suboptimal query plan

Request Count Throttling

In this scenario, a user reports performance degradation and you suspect that the request count is throttled.

The first thing you need to do is verify whether request count throttling is explicitly configured for the group that the user is in. You do this by checking the group membership of the user to see whether the GROUP_MAX_REQUESTS setting is enabled. If GROUP_MAX_REQUESTS is not enabled, there is no explicit request count throttling. You should investigate further by taking the following steps.

  • Query sys.dm_os_waiting_tasks to see if any requests are waiting on the RESMGR_THROTTLED wait type. The presence of this wait type indicates request count throttling.

  • Start Performance Monitor and collect data using the Queued requests and Active requests counters. A non-zero Queued request count indicates request throttling.

  • Check to see if the Active requests value matches the GROUP_MAX_REQUESTS setting. If the Active requests value is higher than the GROUP_MAX_REQUESTS setting, the group may have requests that cannot be throttled (for example, open transactions).

  • If Queued requests is zero, check the Active requests of all the workload groups sharing the same resource pool because the pool may have been overloaded with too many requests.

Maximum CPU Limit

If you have a policy that is driven by Resource Governor event generation you can use the event that is generated when the maximum CPU limit is reached.

In this scenario, you want to determine whether the maximum CPU limit (REQUEST_MAX_CPU_TIME_SEC) that you configured for detecting a query that uses too much CPU is too low.

The following actions will help you validate the CPU limit setting.

  • Start a SQL Trace session and collect the CPU Threshold Exceeded event. When a user request reaches the maximum CPU usage limit, the server generates a SQL Trace event automatically. If your setting is too low, a high number of these events will be generated.

Note

This event is also exposed as a server event notification so you can write a script that reacts to the event.

  • Start Performance Monitor and collect data using the Max request cpu time (ms) counter. You can use the value of this counter as a guide for setting the appropriate limit for the workload group.

CPU Bandwidth Throttling

In this scenario, you suspect that CPU bandwidth is throttled because the CPU usage % performance counter is at or near the Resource Governor MAX_CPU_PERCENT setting. The following query returns the CPU usage % value for all the workload groups and resource pools for a SQL Server instance.

select * from sys.dm_os_performance_counters where counter_name = 'cpu usage %'

For more information, see sys.dm_os_performance_counters (Transact-SQL).

You can determine whether or not CPU bandwidth is throttled by performing the following checks on your system.

  • Check the total server CPU utilization. If a load other than SQL Server is currently active, it may affect the query that you are troubleshooting.

  • Check the distribution of CPU usage among the resource pools. A resource pool could be throttled because another pool has a high minimum value configured for CPU usage. Compare the counters for the expected (calculated CPU utilization) against the actual CPU utilization.

  • Check the workload groups assigned to the resource pool in question. Load from other workload groups can affect users that are sharing the same pool.

  • Check the distribution of the CPU usage across the schedulers. The query that you are investigating might be placed on a scheduler that contains long-running queries. In this case, the query may appear to be throttled but the actual issue is uneven distribution of load between schedulers.

  • Check for possible cases where the workload is getting blocked by other sessions rather than being throttled by Resource Governor settings.

  • Check the number of sessions currently running queries on the system. As the number of concurrent executing requests grows, SQL Server attempts to ensure that all of them are receiving at least some amount of CPU time to prevent CPU starvation.

Memory Grant Size

In this scenario, you suspect that the size of grant memory is causing a query to run slowly.

Resource Governor enforces the maximum query memory limit by reducing grant memory so large queries can fit within the limit. If a query gets less than 100% of the memory grant, it may be necessary to spill temporary data and write it to disk, which can have a noticeable effect on performance.

You need to determine the percentage of large queries to set the appropriate maximum query size limit. The following actions will help determine the best settings:

  • Query sys.dm_exec_query_memory_grants to see the current state of memory grants. The ideal_memory_kb column shows the ideal amount based on the cardinality estimate. The requested_memory_kb column shows the asking amount which could have been reduced after reaching the maximum query limit. If requested_memory_kb is significantly lower than ideal_memory_kb, then the query may end up spilling frequently (assuming that the cardinality estimate is correct).

  • Start Performance Monitor and collect data using the Reduced memory grants/sec counter. The value of this counter represents the rate of memory grant count that received less than the ideal amount after hitting the maximum request size limit. Large queries may run much slower than those with an ideal amount because they need to spill to the disk to stay within the memory limits.

To alleviate a memory grant problem, you may have to increase the pool size limit or the maximum memory size limit.

Note

If you only increase the maximum memory size, this may lead to the reduced concurrency of large queries.

Memory Grant Timeout Error

In this scenario, a query fails with a memory grant timeout error.

Both the total number of active memory grant requests and memory limits specified on the workload group and resource pool definitions can play a role in memory grant timeouts. If a single resource pool is shared by multiple resource groups, the number of concurrent queries in other groups may also affect the memory grant timeouts.

The following actions will help determine the best resource pool settings:

  • Query sys.dm_exec_query_memory_grants to see the number of memory grants and waiting queries on this group and pool.

  • Query sys.dm_exec_query_resource_semaphores to see the total granted memory and targets.

If grant memory usage is greater than the available memory space, then you can consider increasing the resource pool size limit.

Out of Memory Errors

A query fails with an out of memory error.

Basic Troubleshooting

The following actions will help determine the best workload group settings:

  • Query sys.dm_os_memory_brokers to check the relative memory distribution and trend inside the resource pool. Too many requests in too small a memory space can lead to an overloaded workload group/resource pool and cause out of memory errors.

  • Start Performance Monitor and collect data using the memory-related resource pool counters to get the target and current memory usage for memory grants, cached memory, and compile/optimizer memory. If the current values are greater than the target values, it means that the resource pool is overloaded. Consider changing the pool memory limits.

  • Start Performance Monitor and collect data using the request max memory grant counter. If the counter value exceeds the value determined by the REQUEST_MAX_MEMORY_GRANT_PERCENT setting in the workload group, the query will likely fail. Consider changing the workload group limit.

Advanced Troubleshooting

The out of memory (701) error is a generic error returned when tasks attempt to allocate a block of memory from the memory manager and the attempt fails. For more information, see MSSQLSERVER_701.

The following conditions can cause this error:

  • The memory pool reaches its total limit.

Note

Resource Governor may not be the sole cause of this condition. There may be other applications running on the server that have memory demands that contribute to this condition.

  • Multi-page or virtual address space allocation fails because the virtual address space does not have a free block big enough for the required reservation. This is most likely to occur on a 32-bit architecture, and unlikely to occur on a 64-bit architecture.

  • Multi-page or virtual address space allocation fails because the total commitment reaches the commit limit. This applies to both 32-bit and 64-bit architectures.

When you see the out of memory error, the error log is the best starting point for investigating the error. The log contains output similar to the following example:

2006-01-28 04:27:15.43 spid51 Failed allocate pages: FAIL_PAGE_ALLOCATION 1

The possible failures recorded in the error log are:

  • FAIL_PAGE_ALLOCATION, followed by the number of pages that were attempted to allocate

  • FAIL_VIRTUAL_RESERVE, followed by the number of bytes that were attempted to reserve

  • FAIL_VIRTUAL_COMMIT, followed by the number of bytes that were attempted to commit

It is important to understand that the task that triggered the out of memory error is often not the task that caused the error. Unless there is a runaway task, the out of memory condition is typically the culmination of multiple running tasks. As a result, using the very common case of a FAIL_PAGE_ALLOCATION error, your investigation must take a broader view of system activity.

The next valuable piece of information in the error log is the memory status output. Depending on the failure, you should look for single page, multi-page, virtual reserved, or committed numbers for individual memory clerks. Identifying the biggest memory consumers is a key step in continuing to investigate the error. Typically, the biggest memory consumers are of the following types:

  • MEMORYCLERK_* means that the server configuration or workload requires a specific memory allocation. There are corresponding memory clerks for SQL Server components, and individual components can have several memory clerks. For more information, see sys.dm_os_memory_clerks (Transact-SQL). You can sometimes identify the workload causing the problem from the memory clerks, but it is more likely that you will have to examine the memory objects associated with the clerks to find out what caused the large amount of memory consumption.

  • CACHESTORE_*, USERSTORE_*, OBJECTSTORE_* are the types of caches. High memory consumption by a cache may mean that:

    • Memory is allocated out of the cache but is not yet inserted as an entry that can be evicted. This is very similar to the MEMORYCLERK case above.

    • All cache entries are in use so they cannot be evicted. This can be confirmed by looking at sys.dm_os_memory_cache_counters and comparing the values of the entries_count column and entries_in_use_count column.

  • MEMORYCLERK_SQLQERESERVATIONS shows how much memory has been reserved by the query execution (QE) to run queries with sorts/joins. An out of memory error that occurs when reservations are high usually indicates a bug in the server.

The memory status output in the error log will also show which memory pool is exhausted. The memory brokers for every pool show the memory distribution between stolen (compilation), cached, and reserved (granted) memory. The numbers for the three brokers correspond to the preceding memory objects associated with the memory clerks. You can find out how much memory is allocated for a pool from a given clerk or memory object by extracting the information from a full dump using a custom script and the sys.dm_os_memory_cache_entries dynamic management view, which shows the pool_id that each entry is associated with.

If you need to contact Customer Support Services, collect the following information for our support staff:

  • The error log showing the out of memory error and the memory status output at the time of the error.

  • The output from the following statements:

    dbcc memorystatus
    dbcc sqlperf(spinlockstats)
    select * from sys.dm_os_memory_clerks
    select * from sys.dm_os_wait_stats order by wait_type
    select * from sys.dm_os_waiting_tasks
    select * from sys.dm_os_ring_buffers where ring_buffer_type='RING_BUFFER_OOM'
    select * from sys.dm_os_ring_buffers where ring_buffer_type='RING_BUFFER_RESOURCE_MONITOR'
    select * from sys.dm_os_ring_buffers where ring_buffer_type='RING_BUFFER_MEMORY_BROKER'
    select * from sys.dm_os_memory_cache_clock_hands
    
  • Optionally, an out of memory dump collected with T8004. This minidump will have valuable information such as the ring buffers and spinlock/wait stats. The dump counter for T8004 can be reset without restarting the server by turning T8004 off and back on.

Suboptimal Query Plan

In this scenario, you suspect that a query is running slowly because of a suboptimal query plan. The query optimizer may generate a suboptimal query plan if it does not get enough memory because of a low memory limit setting for a resource pool.

The following actions will help determine the best resource pool settings:

  • Get data from the Query optimizations/sec counter to see whether the workload group has a high number of query compiles.

  • Get data from the Suboptimal plans/sec counter to see if the query optimizer is frequently producing suboptimal plans.

If either of the preceding conditions exists, consider increasing the memory limit of the resource pool.