Lessons Learned

Author: Mike Ruthruff

Summary: This white paper discusses lessons learned during test scenarios in which Microsoft deployed and operated a 64-bit server to consolidate Microsoft( SQL Server( instances that previously resided on separate physical computers.

Overview

There are many considerations that need to be evaluated when determining how to approach the consolidation of many SQL Server databases on a single server. One important consideration is whether to use a 64-bit platform or a 32-bit platform to get the best consolidation results. The 64-bit platform offers some advantages over the 32-bit platform in this regard. For example, on the 32-bit platform, some technical limitations may require the use of multiple instances. But on the 64-bit platform, the decision to use multiple instances of SQL Server is more a data management decision than a technical limitation imposed by the platform.

This paper discusses lessons learned during test scenarios in which Microsoft deployed and operated a 64-bit server to consolidate SQL Server instances that previously resided on separate physical computers. It is provided as an example of how consolidation can be done with Microsoft SQL Server on the 64-bit platform. Specifically this paper will provide the following:

  • An example of a 64-bit configuration that may be used for database consolidation, including a discussion of server and storage configuration.

  • A comparison between the performance of multiple instances of SQL Server versus a single instance, including best practices for memory tuning.

  • Guidance for how to use Windows System Resource Manager (WSRM) to manage CPU resources in an environment of multiple instances of SQL Server.

  • Information about the effects of consolidating heterogeneous workloads onto the same server.

This document is not intended to answer all questions encountered during the planning phase of consolidating SQL Server databases. Further, the results of the scenarios presented in this document may not necessarily apply to all consolidated environments. For a more in-depth look at the planning necessary to successfully select and deploy environments for SQL Server consolidation, see the resource "Planning for Consolidation with Microsoft SQL Server 2000" provided in Appendix C. For information about the service offerings Microsoft provides for SQL Server consolidation, see the resource for "Microsoft Service Offerings" in Appendix C.

On This Page

64-Bit Consolidation Environment Configuration
Single Instance vs. Multiple Instances
Windows System Resource Manager and SQL Server
Heterogeneous Workloads
Conclusion
Appendix A: Considerations for Multi-Instance Memory Contention
Appendix B: Storage Configuration and Performance
Appendix C: References
Appendix D: Hardware Configuration

64-Bit Consolidation Environment Configuration

Host Hardware Configuration

The host hardware used in the test scenarios presented in this document was the NEC Express5800/1320Xd 64-bit server. This server was configured with 32 1.5 gigahertz (GHz) Intel( Itanium II processors and 64 gigabytes (GB) of RAM. The server was divided into two separate 16 processor partitions, each having one-half of the total memory (32 GB on each partition). Each partition was configured with four Emulex LP9002 Host Bus Adapters (HBAs). Only one of the two 16-processor partitions was used for the test consolidation scenarios.

Storage Configuration

The storage configuration used in the test scenarios presented in this document was the NEC S-Series Storage disk array, used to store all of the SQL Server data files. In addition to being used for the test scenarios, this disk array was also used concurrently by the second partition of the NEC Express5800/1320Xd for unrelated SQL Server workload testing, which may be typical in real deployments where multiple hosts are using the same storage array for SQL Server workloads. To ensure there was no contention at the physical disk level when running tests, the NEC S-Series Storage was configured such that the logical unit numbers (LUNs) used by each partition resided on separate physical disks. The LUNs used for the consolidation portion of the storage were spread across a total of 56 physical spindles (or physical disks). Overall, four LUNs were created and presented to Microsoft Windows(, with a total capacity of approximately 1 terabyte. Appendix B contains a detailed breakdown of the storage configuration used in the test scenarios presented in this document.

Single Instance vs. Multiple Instances

One of the main benefits of the 64-bit platform for consolidation is the ability to scale up within a single instance of SQL Server. Unlike applications running on 32-bit platforms, applications running on 64-bit platforms benefit from a much larger virtual address space. On the 32-bit platform, applications are limited to a 4 GB virtual address space, with 2 GB reserved for kernel use. The larger virtual address space on the 64-bit platform allows for more consolidation within a single instance, and is beneficial in scenarios where you are consolidating many databases, where you have a large number of connected users to support, or where you previously partitioned the server into multiple instances due to demands on memory for structures such as locks and procedure cache. On the 32-bit platform, configuration options such as /3GB, /PAE, and Address Windowing Extensions (AWE) do allow SQL Server to utilize a greater amount of memory; however, consider the following caveats regarding those options:

  • Enabling AWE memory in Microsoft SQL Server 2000 allows SQL Server to utilize memory beyond the 2 GB or /3 GB limit; however, the additional memory can only be utilized for data cache. Memory for structures such as locks, cached procedure plans, user connections, and cursors can only reside in the non-AWE memory area.

  • Enabling AWE memory in SQL Server 2000 results in SQL Server reserving and committing the memory at process start instead of increasing memory dynamically based on demand. This results in SQL Server holding on to this memory until process termination. This memory is also non-pagable.

The 64-bit platform provides more flexibility when deciding to utilize a single instance or multiple instances because of the much larger virtual address space (a theoretical limit of approximately 18 Exabytes) than the 32-bit platform. Currently, SQL Server 2000 (64-bit Edition) provides support for up to 512 GB of memory and there are no restrictions on where data structures can reside in the user portion of the address space.

In SQL Server consolidation scenarios, there are many considerations in determining whether to deploy multiple instances of SQL Server. These include application compatibility, workload isolation, flexibility for maintenance, availability requirements, and security. These issues can all be addressed by using multiple instances of SQL Server on a single server. A more in-depth discussion of each of these is included as part of the "Planning for Consolidation with Microsoft SQL Server 2000" resource in Appendix A of this paper.

In order to illustrate the performance implications of running multiple instances of SQL Server versus a single instance, Microsoft tested both scenarios using identical workloads. However, before reviewing the results it is important to understand the characteristics of the different databases and associated workloads involved in the test scenarios.

Databases and Workloads

For the test consolidation scenarios, Microsoft used several databases and workloads, selected from actual production databases previously residing on separate servers. Databases were selected from a total of 10 servers, with each server containing between 1 and 25 databases. In total, 63 databases were used, with sizes ranging from 2 megabytes (MB) to 85 GB, with an average of approximately 7 GB. The cumulative data size for all of the databases involved was approximately 460 GB.

Using the databases that were selected, two configuration scenarios were established for testing:

  • Each set of databases from an individual server was restored onto a separate instance of SQL Server on the NEC Express5800/1320Xd, resulting in a total of 10 SQL Server instances.

  • All databases from all previously separate instances were restored within a single instance of SQL Server on the NEC Express5800/1320Xd.

To create workloads to test against the consolidated servers, SQL Profiler traces were captured and replayed for each of the separate instances. Identical workloads were run against each configuration, and performance was monitored during each test using Windows Performance Monitor. The workloads in the SQL Profiler trace files varied, however most were online transaction processing (OLTP) workloads consisting primarily of transactional queries which modify existing data and insert new data. The I/O characteristics of these workloads created heavy log volume traffic (due to the large number of small writes operations) and some reading/writing against data volumes. Most of the reads operations issued against the data files were selective (that is, small reads).

In addition to the workloads above, additional tests were run in which workloads with different I/O characteristics were run concurrently with the replayed workloads to better capture the impact on overall system performance in consolidated environments. This subset of workloads included a decision support system (DSS) (also referred to as relational data warehouse (RDW)) workload consisting of complex queries with parallel execution plans. The I/O characteristics of this workload consisted of very heavy read activity against the data volumes (that is, large reads), very little log traffic, and a maintenance workload which performed several Database Console Command (DBCC) operations (CHECKDB, SHOWCONTIG, DBREINDEX) against a moderately sized database (25 GB). Both of these additional workloads have much more intensive I/O demands. Results using these workloads are discussed later in "Tempdb Considerations" and "Heterogeneous Workloads".

Workload performance was tested using several different tuning options available to SQL Server through the sp_configure stored procedure. Table 1.1 lists the settings used in the tests.

Table 1.1   Settings for the SP_CONFIGURE Stored Procedure

Setting

Single instance

Multiple instance

Default settings (except for the max degree of parallelism setting, which was used for all tests).

4

4

max server memory

30,000 (MB)

Varied by instance, with the cumulative total being 30,000 (MB).

CPU affinity

65,535, which is equivalent to a bit mask including all CPUs (1111111111111111).

Each instance was configured with 65,535.

Performance Comparison of Single Instance vs. Multiple Instances

The tests that Microsoft performed for workloads demonstrated a slight performance benefit when consolidating into a single instance of SQL Server. Running within a single instance slightly outperformed running identical workloads distributed across multiple instances. This result was consistent across all configurations that were tested. Figure 1.1 illustrates the average number of transactions per second during the period of time when all workloads were active. A more in-depth look at each of the tuning options and related results occurs later in this document.

Figure 1.1   Transactions per section for a single instance versus multiple instances

Figure 1.1   Transactions per section for a single instance versus multiple instances

Note that while performance is important, it is only one consideration in determining whether to use a single instance or multiple instances. There any many other important considerations which are outside of the scope of this document. See "Planning for Consolidation with Microsoft SQL Server 2000" in Appendix C for more information.

Managing SQL Server Memory Usage on a 64-Bit Platform

The 64-bit platform allows SQL Server to use much larger amounts of memory than it can use on the 32-bit platform. As a result, memory configuration is very significant to ensuring that all SQL Server instances, as well as other applications running on the server, have enough memory to function properly. SQL Server will always attempt to leave at least 128 MB of physical memory available to ensure the proper functionality of Windows; however, there are some circumstances in which SQL Server may be unable to do so.

For example, consider the case in which multiple instances of SQL Server are running concurrently. If one instance is taken offline for maintenance while the other instances continue to run, the active instances have the potential to increase memory consumption until most of the physical memory on the server is exhausted. When the instance that was taken offline is brought back online, there may not be adequate memory to use for data cache. This can result in a period of increased I/O while the instances 're-balance' the memory utilization, and this can impact the performance of all instances using the I/O subsystem.

During testing, in order to determine the optimal values for the max server memory configuration option of each SQL Server instance, memory consumption was monitored on a per instance basis using the Memory Manager: Total Server Memory Performance Monitor Counter. Monitoring memory consumption of instances using the default memory settings helps to determine potential memory consumption on a per instance basis.

One benefit of consolidating within a single instance of SQL Server is that memory management is contained within a single process. During workload testing, more efficient use of the buffer pool resulted in less I/O issued to the I/O subsystem, ultimately resulting in better transactional throughput and more efficient use of the CPU resources. In general, it is easier for SQL Server to manage memory consumption internally within a single instance rather than across multiple instances. Figure 1.2 & 1.3 illustrate the reduced I/O issued by SQL Server in a single instance as compared to multiple instances and also illustrate the difference in processor utilization between the single and multiple instance cases.

Figure 1.2   Total disk reads per second for a single instance versus multiple instances

Figure 1.2   Total disk reads per second for a single instance versus multiple instances

Figure 1.3   Total average processor utilization for a single instance versus multiple instances

Figure 1.3   Total average processor utilization for a single instance versus multiple instances

Note that there are some differences in the behavior of the memory tuning options in SQL Server between the 32-bit platform and the 64-bit platform. Table 1.2 lists these differences.

Table 1.2   Memory Configuration Behavior Differences between 32-Bit and 64-Bit Platforms

Behavior

SQL Server 2000 (32-bit)

SQL Server 2000 (64-bit)

If the max server memory option is set to a value lower than the current total server memory setting and RECONFIGURE is run, will SQL Server dynamically adjust the memory down to the max server memory setting (without restarting the service)?

Yes

Yes

If the min server memory option is set, on process start will SQL Server acquire physical memory equal to this setting?

No

Yes

(Note   This behavior is incorrectly stated in the Microsoft SQL Server 2000 Books Online.)

If the min server memory option is increased to a value higher than the current total server memory setting and RECONFIGURE is run, will the instance acquire memory up to the min server memory setting (without restarting the service)?

No

Yes

As Table 1.2 illustrates, when the min server memory option is configured on the 64-bit platform, SQL Server reserves and commits the memory at process start. While this can be used to ensure that a specific instance obtains a guaranteed amount of memory at startup, care should be taken to ensure that there will always be at least that amount of physical memory available on the server when the SQL Server instance is started. Failure to do so can result in excessive use of virtual memory within the Windows paging file, which adversely affects the performance of the entire system.

When tuning memory consumption in SQL Server on the 64-bit platform, keep the following in mind:

  • If multiple instances of SQL Server are running concurrently, explicitly set the max server memory option of each instance to ensure that enough physical memory will be available for other SQL Server instances. Make sure to also consider other applications running on the server, as well as the operating system. To avoid paging, there should ideally always be at least 128 MB of physical memory left on the system. Monitoring the Memory: Available Bytes and Memory: Pages/sec counters can help to determine how much physical memory is available on the system and whether or not the system is paging.

  • When using a single instance of SQL Server, it is recommended that you set the max server memory option so that there is adequate physical memory for other applications as well as for the operating system.

  • When using multiple instances of SQL Server, you should perform testing to determine the optimal settings for the memory options of each SQL Server instance. Monitoring total memory usage using SQL Server: Memory Manager counters as well as SQL Server: Buffer Manager counters can give you a detailed breakdown of the memory consumption of each instance. Memory usage before consolidation can be used to estimate appropriate levels for the max server memory option.

  • The min server memory setting can be used to guarantee a certain amount of memory for each SQL Server instance at process start. To ensure that there will always be at least this amount of physical memory available on the server, set the proper max server memory setting for each instance. Failure to do so could result in poor performance of the entire server due to an excessive use of virtual memory.

CPU Affinity

On multi-processor systems, SQL Server interaction with CPU resources can be controlled by setting the affinity mask and affinity64 mask configuration options. Configuring these options will result in the threads in SQL Server being scheduled to run on the same processor each time they are run. If you do not configure these options, the threads may migrate between processors. There are two main scenarios in which you might consider using the CPU affinity option.

CPU Affinity Scenario 1: Using Affinity to More Efficiently Use Memory

Many high-end multi-processor systems now use Non-Uniform Memory Access (NUMA) or Cellular MultiProcessing (CMP) architectures. The NEC Express5800/1320Xd used in the testing discussed in this document is based on the NUMA architecture. Setting the affinity mask option on NUMA systems may offer a performance benefit as a result of better utilization of local processor cache. See Appendix C for resources that provide more detailed information about these technologies.

During testing of this scenario, the affinity mask option was configured as 65,535 across all SQL Server instances, which equates to the bit mask '1111111111111111' (one bit per processor). For the workloads used, configuring the SQL Server instances in this manner resulted in slightly less total throughput (as shown earlier in Figure 1.1). It is important to note, however, that results obtained using this setting may vary and are dependent on the characteristics of each workload. It is recommended that you test on a case-by-case basis to determine whether or not this option will benefit a particular workload. Figure 1.4 and Figure 1.5 show the impact of setting the affinity mask option during testing on single and multiple instances of SQL Server.

Figure 1.4   Processor queue length for a single instance versus multiple instances

Figure 1.4   Processor queue length for a single instance versus multiple instances

Figure 1.4 shows the processor queue length with and without the affinity mask option. Processor queue length is the number of threads that are ready to execute but that are waiting in a queue for an available processor. In the test scenario, increased processor queue length was a result of setting the affinity mask option. Instead of being able to run on any CPU that was not busy, threads had to wait for a specific processor to become available. Although the processor queue length shown is not unacceptable for the hardware being used, the associated lower processor utilization indicates that the overall CPU resources are not being fully utilized.

Figure 1.5   Context switches per second for a single instance versus multiple instances

Figure 1.5   Context switches per second for a single instance versus multiple instances

Figure 1.5 illustrates the average number of context switches per second. Context switching occurs when a different thread is dispatched on a processor, requiring its context to be loaded for processing. Using the affinity mask setting resulted in fewer context switches; however, it also resulted in the CPU resources not being fully utilized. In all tests performed the number of context switches per second was lower when running workloads against a single instance of SQL Server.

CPU Affinity Scenario 2: Dividing CPU Resources

A common reason to use the affinity mask option is to limit which CPUs are available to a particular instance of SQL Server. For scenarios in which you are consolidating multiple instances, this approach can be used to divide the CPU resources among the multiple instances. However, although this is a way to ensure that certain CPU resources are available only to certain instances of SQL Server, it can ultimately result in the total CPU resources available on a server not being fully utilized. A better approach to managing CPU resources is to use a management tool such as Windows System Resource Manager. This approach is discussed later in this document.

Tempdb Considerations

In addition to the considerations mentioned previously for using single or multiple instances of SQL Server, another consideration involves tempdb usage. The tempdb database is a shared resource between all databases contained within the same instance of SQL Server, and as such it has the potential to become a single point of contention.

Because its usage can vary greatly and is highly dependent on workload, there are many factors involved when deciding how best to configure tempdb. For the configuration in the test scenario, Microsoft placed data files for tempdb on physically isolated disks within the NEC S-Series Storage. Monitoring for tempdb contention was accomplished by polling master..sysprocesses at regular intervals during the execution of workload. In addition, I/O activity and response times against tempdb were monitored using Performance Monitor. For the workloads in the test scenario, no contention was observed.

Determining how to size the tempdb database also involves many different factors. For the purpose of consolidation, you can estimate how much tempdb space is needed by monitoring usage and looking at the current size of tempdb on separate servers. In the test scenario presented in this document, no information existed regarding prior tempdb usage or size. Consequently, for the single instance configuration, tempdb was sized at 30 GB and autogrow was allowed in 20 percent increments. Storage space for tempdb was budgeted at 20 percent of the total data size, which was close to 90. In the test scenario, the storage configuration had adequate storage to accommodate tempdb in case it grew, and a tempdb file size of 30 GB turned out to be enough for all of the workloads (including DSS and maintenance workloads that ran concurrently). No autogrow of the tempdb data files was observed during the workload tests; however, slightly higher than average tempdb I/O activity was observed when running the DSS and maintenance workloads concurrently with the OLTP workloads. For a detailed breakdown of the tempdb I/O activity, as well as all other I/O activity during the tests, see "Appendix B: Storage Configuration and Performance".

Decisions about how to approach tempdb configuration and sizing should include consideration of the following:

  • Isolating tempdb data files can offer an advantage in some cases where tempdb I/O is very demanding and collocating tempdb files with data files introduces the risk of negatively effecting the response time of disks used for other data files; however, this can also lead to the total disk resources not being fully utilized, offer less flexibility and require more thought be given to the storage configuration. Understanding the characteristics of the use of tempdb by a specific application will help you to make decisions about where to place those data files.

  • When consolidating into a single instance of SQL Server, tempdb has the potential to become a single point of contention. It is important to understand individual server use of tempdb, because you can use this as a guide when you consolidate into a single instance.

  • You may encounter problems when consolidating SQL Server 2000 databases with different collations, because tempdb will retain the default collation of the model database. For more information about consolidated databases with different collations, see the SQL Server 2000 Books Online topic "Mixed Collation Environments".

  • On systems with eight or more processors, having .25 to .50 data files per processor for tempdb allows for better scalability, especially for workloads with extensive use of temporary objects. See the references in Appendix C for more information.

Windows System Resource Manager and SQL Server

Windows System Resource Manager is a service included with Microsoft Windows Server 2003, Enterprise Edition and Microsoft Windows Server 2003, Datacenter Edition that allows administrators to control how CPU and memory resources are allocated to applications, services, and processes. Some features of WSRM are discussed here. For more detailed information, see the references in Appendix C and the WSRM Help file installed with WSRM.

It is recommended to use WSRM to manage the amount of CPU resources allocated to each instance of SQL Server. However, although WSRM provides the ability to control processor affinity and the memory consumption of a process, it is not recommended to use these features of WSRM to manage SQL Server instances. Tuning memory consumption or processor affinity should always be done using the SQL Server sp_configure system stored procedure (which includes the affinity mask, min server memory, and max server memory settings).

WSRM controls the amount of CPU resources available to the processes it manages by dynamically changing the base priority of the managed processes. Following are some general rules to be aware of when using WSRM with SQL Server:

  • WSRM monitors the total CPU consumption on the server and will adjust the priorities of the managed processes when the total CPU resources are nearly exhausted (greater than 75 percent).

  • WSRM manages processes based on predefined policies configured by the server administrator. CPU allocation policies can be defined on a per process level. A WSRM policy defines a target value for CPU allocation. These are not fixed limits for a given process; instead, they are used to determine which processes should be given priority when CPU resources are nearly exhausted.

  • By default, any process that does not explicitly match a policy is placed in the WSRM 'default' group. The default group is given 100-N percent of the CPU resources, where N is the sum of all allocations for explicit policies.

  • WSRM provides the ability to add processes to a user-defined exclusion list, in which case access to CPU resources will not be managed by WSRM. However, it is not recommended to place processes on this exclusion list except under special circumstances because these processes will have unlimited use of CPU resources and will result in WSRM being less effective for the processes it manages.

Before discussing each scenario tested using WSRM it should be noted that during the WSRM related tests, the /NUMPROC=8 Boot.ini switch was used to limit the number of CPUs visible to Windows so that the workloads would exhaust the total CPU resources of the NEC Express 5800/1320Xd, resulting in WSRM managing the CPU usage.

WSRM Scenario 1: Limiting Resources to a Specific SQL Server Instance

WSRM was tested in two different scenarios in an attempt to control which SQL Server instances had priority access to the CPU resources on the server. For the first scenario, WSRM was used to limit the amount of CPU resources given to a specific SQL Server instance, to ensure that other instances had priority to CPU resources. Workloads were run concurrently against 11 SQL Server instances. The workloads for the first 10 instances consisted mainly of OLTP-type workloads. The workload for the eleventh instance consisted of multiple streams running complex queries with parallel execution plans (a DSS-style workload). A WSRM policy was configured so that the DSS workload was explicitly given a CPU allocation target of 10 percent. The remaining SQL Server instances were managed by the default WSRM policy which split the remaining 90 percent of the CPU resources evenly across all instances. During the tests, CPU utilization and transactional throughput were measured. The results are shown in Figure 1.6 and Figure 1.7.

Figure 1.6   Transactions per second by SQL Server instance

Figure 1.6   Transactions per second by SQL Server instance

Figure 1.6 illustrates transactions per second for each SQL Server instance running an OLTP-type of workload. The total transactions per second achieved across all OLTP-type workloads were approximately 506 when WSRM was not managing the processes, and 587 when WSRM was managing the processes. By using WSRM to limit the amount of CPU resources available to the SQL Server instance running the DSS-style workload, the OLTP workloads gained 14 percent in transaction throughput.

Figure 1.7   Processor utilization by SQL Server instance

Figure 1.7   Processor utilization by SQL Server instance

Figure 1.7 shows the CPU utilization by SQL Server instance. By enabling WSRM, the average CPU utilization for the SQL Server instance running the DSS-style workload was reduced from approximately 254 percent to 206 percent (this is greater than 100 percent because on a multi-processor system, CPU utilization can reach the number of processors times 100 percent). More notably, the maximum recorded CPU utilization for the SQL Server instance running the DSS-style workload was reduced from 682 percent without WSRM enabled to 361 percent with WSRM enabled. By enabling the WSRM policy, WSRM was able to ensure that the OLTP workloads had priority to the CPU resources and the achievable throughput for those workloads was increased, even with the total CPU resources having been exhausted.

WSRM manages the CPU resources available to a particular process by monitoring total CPU utilization and dynamically adjusting the base priority of the managed process. In Figure 1.8, a Windows Performance Monitor chart shows:

  • Process: % Processor Time

  • Process: Priority Base for sqlservr.exe

  • Processor: % Processor Time (_Total)

    Figure 1.8    Management of process priority using WSRM

    Figure 1.8    Management of process priority using WSRM

In Figure 1.8, notice how WSRM interacts with the managed processes. The instance sqlservr#8 represents the process running the DSS-style workload, and the instance sqlservr#3 represents one of the SQL Server instances running an OLTP-type workload. When the CPU utilization was more than 75 percent utilized, WSRM adjusted the base priority of the instance running the DSS workload so that it was below the other OLTP instances. For the sake of readability, counters for all of the SQL Server processes involved have not been included.

WSRM Scenario 2: Giving Priority to a Group of SQL Server Instances

In the second test scenario, WSRM was used to grant more CPU resources to the three SQL Server instances that used the highest amount of CPU resources. For this test, OLTP-type workloads were run against a total of 10 SQL Server instances. A WSRM policy was defined for the three SQL Server instances, with each given a 20 percent CPU allocation target. The last seven instances were controlled by the default WSRM policy, splitting the remaining 40 percent of the CPU resources evenly. CPU utilization and transactional throughput were measured during the tests.

Figure 1.9 and Figure 1.10 show the transactional throughput and CPU utilization for each of the 10 SQL Server instances that were tested.

Figure 1.9   Transactions per second by SQL Server instance

Figure 1.9   Transactions per second by SQL Server instance

Figure 1.10   CPU utilization by SQL Server instance

Figure 1.10   CPU utilization by SQL Server instance

In Figure 1.9 and 1.10 OLTP4, OLTP7, and OLTP8 represent the three instances with a CPU allocation target of 20 percent each as defined by the WSRM policy. The remaining instances are in the WSRM default group and split the remaining 40 percent of the CPU resources. Using WSRM in this manner resulted in the cumulative transactions per second across all SQL Server instances increasing from 627 with WSRM disabled to 682 with WSRM enabled, representing a gain of approximately 8 percent.

As both test scenarios demonstrate, WSRM was able to successfully manage the priority given to the CPU resources on the server, meaning that WSRM only managed CPU resources when necessary and the overall machine resources were fully utilized. Because WSRM manages resources at the process level, multiple instances of SQL Server must be used in order to manage SQL Server with WSRM.

Heterogeneous Workloads

In the context of this document the term heterogeneous workload refers to workloads with very different I/O characteristics. In consolidating multiple databases onto a single server, it is generally not recommended that workloads containing very different I/O characteristics be placed on the same server and more importantly on a shared I/O subsystem. Database workloads can be categorized into two general categories:

  • Online transaction processing (OLTP) workloads consisting mainly of transactional queries that modify existing data and insert new data. The I/O characteristics of this type of workload are: heavy log volume traffic (a large number of small writes), and some reading and writing to the data volumes. Most of the reads being issued against the data files are selective (that is, small reads).

  • Decision support system (DSS) workloads or relational data warehouse (RDW) workloads consisting of complex queries that commonly have parallel execution plans. The I/O characteristics of these types of workloads are: very heavy read activity against the data volumes (that is, large reads) and very little log traffic.

The descriptions of workloads have been simplified here for easier discussion — the workloads you use may not always fit exactly into one of these two categories. However, understanding the I/O characteristics of a particular workload is important when you are consolidating multiple workloads together on a common set of disks. Mixing workload types on the same set of physical spindles can have adverse effects on the performance of the entire system.

For some test scenarios, Microsoft combined a DSS-style workload with OLTP-style workloads to determine how the DSS workload would affect OLTP performance. One set of tests was run with DSS data files residing on the same physical spindles as the OLTP data files, and another set of tests was run with the DSS data files residing on separate physical spindles. Figure 1.11 shows the results from these tests.

Figure 1.11   Heterogeneous workload performance

Figure 1.11   Heterogeneous workload performance

As the results in Figure 1.11 demonstrate, the I/O for DSS-style workloads generally resulted in higher disk latencies. The disk latency for reads issued against the OLTP data files dropped from 32 milliseconds (ms) to 13.5 ms when the DSS data files were isolated on separate physical spindles. Performance of the DSS workload is dependent on read throughput. This throughput was maintained when the data files were placed on separate disks, and no performance impact was observed as a result. However, separating the two workloads resulted in lower average latencies for the I/Os issued by the OLTP workloads, and ultimately resulted in better workload performance for the OLTP workloads in the test scenario. The effect of mixing workloads comes down to the I/O characteristics of the workload, the capacity of the hardware being used, and the design of the storage. You should take each of these factors into consideration if you plan to mix workloads on a single server.

Conclusion

The 64-bit platform offers some advantages over the 32-bit platform with regard to SQL Server consolidation. The decision to use multiple instances of SQL Server is more a data management decision than a technical limitation imposed by the platform. In addition to performance, there are many considerations that need to be evaluated when determining how to approach the consolidation of many SQL Server databases on a single server. This document is offered as an example of an approach for consolidation, rather than hard and fast rules for accomplishing consolidation. With proper planning and analysis, consolidation of databases can be done effectively, resulting in ease of management and reduced cost.

In summary, key lessons learned from the consolidation test scenarios are as follows:

  • You may realize a slight performance gain when you consolidate within a single instance of SQL Server. It is important to keep in mind, however, that performance is only one of many considerations. It is recommended that you review the article "Planning for Consolidation with Microsoft SQL Server 2000" that is referenced in Appendix C.

  • It is recommended as a best practice that you set upper memory limits for SQL Server instances (using the max server memory option). This is especially important on the 64-bit platform where SQL Server instances have the potential to consume large amounts of memory. In multiple instance scenarios you should always set the max server memory, because it is possible for memory contention to occur between the multiple instances of SQL Server.

  • You can use Windows System Resource Manager to dynamically manage CPU resources given to SQL Server instances. This may be a reason to consider using multiple instances, because WSRM allows administrators the ability to give priority to specific applications. There must be multiple instances of SQL Server in order to use WSRM to manage the SQL Server resources.

  • In general it is recommended that you avoid consolidation of heterogeneous workloads onto the same server; however, this may not always be possible. If you are consolidating workloads with different I/O characteristics, give careful thought to the configuration of the I/O subsystem with respect to workload isolation.

Appendix A: Considerations for Multi-Instance Memory Contention

Memory Pressure Effect on Workload Performance

In addition to the consolidation scenarios described in this document, test scenarios using multiple instances of SQL Server were run on another 64-bit system that contained less physical memory and fewer CPU resources than the NEC system. The result was that the SQL Server instances had to compete for the available memory on the server. The tests were conducted on a four-processor 64-bit Itanium system configured with 12 GB of RAM, and during the tests the overall impact on performance was measured.

SQL Server continues to consume memory for its buffer pool until the server is either restarted, a configuration change is made to the max server memory setting, or there is memory pressure from other applications on the system. To determine the effect of memory pressure on workload performance, an instance of SQL Server was started after another instance of SQL Server had acquired most of the physical memory on the server. Before running the tests, a performance benchmark was conducted where both instances were brought online at the same time (scenario A). For the next test (scenario B), instances were configured with max server memory and min server memory set to 0, which resulted in SQL Server consuming memory until there was only 128 MB of available physical memory. In the last test (scenario C), the amount of memory consumed by the instance already online was limited to 10 GB of the available 12 GB, by setting the max server memory. Table A.1 outlines the workload scenarios that were run.

Table A.1   Contention Scenarios

Scenario

Instance 1

Instance 2

Scenario A (benchmark without memory pressure)

Brought online at the same time as instance 2.

Brought online at the same time as instance 1.

Scenario B

Brought online after instance 2 had consumed all physical memory up to the max server memory setting and was left idle.

Brought online before instance 1. Consumed all available RAM up to the max server memory setting, and then was left idle. This scenario was repeated twice, once with the max server memory set to zero and then again with the max server memory set to 10 GB.

Scenario C

Brought online after instance 2 had consumed all physical memory up to the max server memory setting and was active with OLTP workload.

Brought online before instance 1. Consumed all available RAM up to the max server memory setting and then the OLTP workload started. This scenario was repeated twice, once with the max server memory set to zero, and then again with the max server memory set to 10 GB.

Figure A.1 illustrates the transactional throughput realized by the workload for scenarios A, B, and C described above.

Figure A.1   Transactions per second for SQL Server instance coming online (after another instance had consumed nearly all physical memory)

Figure A.1   Transactions per second for SQL Server instance coming online (after another instance had consumed nearly all physical memory)

Figure A.1 shows the impact on transactional throughput of bringing a SQL Server instance online when another instance has already consumed a portion of the available memory. Scenario A illustrates the transactional throughput that is achievable when both instances are brought online at the same time (with no memory pressure between the instances). For scenario B and scenario C, the blue bars represent the transactional throughput for an instance being brought online when another instance had already consumed approximately 11.5 GB of the available 12 GB (the max server memory option for the instance that was already online was set to zero, or unlimited). For scenario B and scenario C, the red bars represent the transactional throughput achieved by the instance being brought online when another instance had consumed 10 GB out of the 12 GB of physical memory, leaving a small amount of memory for the instance coming online (memory consumption for the instance already online was limited to 10 GB of the 12 GB by setting the max server memory option to 10 GB). Scenario B and scenario C illustrate the difference when an instance is being brought online against an instance that is idle versus against an instance that is active.

Figure A.1 illustrates that limiting the amount of memory for one SQL Server instance can result in better performance for the instances that are being brought online. In scenario B, bringing an instance online against an idle server, there was a 55 percent reduction in the number of transactions per second that could be achieved. Performance was greatly increased by setting an upper limit on the first instance, leaving a small portion of the memory available. In scenario C, bringing an instance online against an active server, the difference is even greater, with a 203 percent reduction in the transactions per second that could be achieved.

Figure A.2   Total disk reads per second for entire disk subsystem (while multiple instances are competing for memory)

Figure A.2   Total disk reads per second for entire disk subsystem (while multiple instances are competing for memory)

As Figure A.2 illustrates, in addition to reducing the transactional throughput caused by memory acquisition delays for buffer pool when the instance comes online, there is also more disk I/O, which impacts performance for both of the SQL Server instances. The SQL Server instance being brought online is unable to cache the requested data pages, resulting in increased disk activity. By limiting the memory consumption of the instance already online, the instance being brought online is better able to cache data pages, resulting in reduced I/O across the entire system.

Appendix B: Storage Configuration and Performance

For the consolidation testing discussed in this document, Microsoft used the NEC S-Series Storage. To ensure that there was no contention at the physical disk level with other non-benchmark workload, the NEC S-Series Storage was configured such that the logical unit numbers used by each partition resided on separate physical disks. The LUNs used for the consolidation portion of the storage were spread across a total of 56 physical spindles (or physical disks). Overall, four LUNs were created and presented to Windows with a total capacity of approximately 1 terabyte.

As Figure B.1 illustrates, the logical devices (LD) used for the consolidation testing resided on physically isolated spindles. Each logical device number represents a physical drive. Because a physical spindle is the smallest component of a LUN on NEC S-Series Storage, designing for isolation is very easy to accomplish. Isolation between the data and log was used in order to group common I/O characteristics and to reduce the latency of the log writes. Tempdb was isolated from the data to ensure good performance of the data files, because not much information was available about the tempdb characteristics of most of the workloads.

Figure B.1   Configuration of logical physical disks

Figure B.1   Configuration of logical physical disks

Table B.1 shows a breakdown of the use of each of the logical unit numbers.

Table B.1   Logical Unit Number Configuration

LUN Number

Size

Volumes

Use

Spindles

LD0 – LD5

Not used

Not used

Not used

54

LD6

265 GB

Data1 (132 GB) Data2 (133 GB)

Data files

16

LD7

133 GB

Log1

Log files

8

LD8

265 GB

Data3 (132 GB)

Data4 (133 GB)

Data files

16

LD9

265 GB

Data5 (132 GB)

Tempdb1

Tempdb and backup data

16

Storage area network (SAN) configurations used with SQL Server can vary greatly depending on the characteristics of the particular usage scenario. For the test scenarios discussed in this document, the following considerations affected the decision to configure storage:

  • Due to the number of databases in the scenario, it was impossible to attain physical isolation between the log and data set for each database. Instead, physical isolation boundaries were established between all log, data, and tempdb files so that files with similar I/O characteristics would be grouped together.

  • The two logical unit numbers that were used for data files were divided into four logical volumes using the Windows Disk Management MMC snap-in This was done to allow for more parallelism during the backup and restore operations, because backup and restore operations create one reader and one writer thread per volume. Data files were then distributed evenly by size across all volumes. This also allowed for more granular monitoring using Logical Disk Performance Monitor counters.

  • Simplicity of the storage configuration was favored over attempting to fine tune for specific workloads, since the goal of the testing was to determine how consolidated workloads behaved together, including OLTP with DSS-style workloads. In many deployments, it may be advantageous to logically or physically group the data files based on I/O characteristics, based on logical business functions, or both.

  • All I/O traffic between the host server and NEC S-Series Storage was load-balanced (in a round-robin fashion) between the four HBAs using NEC Storage PathManager 3.0 Enterprise for Windows, a multipathing software solution based on Microsoft Multipath I/O (MPIO) technology. In general, using a load balancing solution is recommended because it results in easier configuration and redundancy in the case of an HBA failure. Direct connection was made between the host server and the fiber channel ports on the NEC S-Series Storage.

The following figures disclose the I/O performance achieved with the NEC S-Series Storage while running workloads during the test consolidation scenarios. They are included as a reference to the type of I/O performance that is possible with the configuration scenarios that were tested.

cosl6415.gif

Figure B.2   Single instance disk throughput (OLTP and DSS workloads concurrently)

Figure B.3   Single instance disk throughput (OLTP workloads only)

Figure B.3   Single instance disk throughput (OLTP workloads only)

Figure B.4   Multiple instance disk throughput (OLTP and DSS workloads concurrently)

Figure B.4   Multiple instance disk throughput (OLTP and DSS workloads concurrently)

Figure B.5   Multiple instance disk throughput (OLTP workloads only)

Figure B.5   Multiple instance disk throughput (OLTP workloads only)

Appendix C: References

NEC S-Series Storage and Express5800/1320Xd

https://www.necstorage.com

https://www.necstorage.com/product/san/s4300/index.shtml

https://www.nec64.com

Planning for Consolidation with Microsoft SQL Server 2000

https://www.microsoft.com/technet/prodtechnol/sql/2000/plan/SQL2KCon.mspx

Microsoft Service Offerings

https://www.microsoft.com/services/microsoftservices/default.mspx

Windows System Resource Manager Home

https://www.microsoft.com/windowsserver2003/technologies/management/wsrm/default.mspx

Microsoft Storage Technologies – Multipath I/O

https://www.microsoft.com/windowsserversystem/storage/technologies/mpio/default.mspx

Microsoft KB Article 271624 INF: Using DBCC MEMORYSTATUS to Monitor SQL Server Memory Usage

https://support.microsoft.com/?id=271624

Microsoft KB Article 314546 HOW TO: Move Databases Between Computers That Are Running SQL Server

https://support.microsoft.com/?id=314546

Microsoft KB Article 328551 FIX: Concurrency Enhancements for the Tempdb Database

https://support.microsoft.com/default.aspx?scid=kb;en-us;328551

Appendix D: Hardware Configuration

Server

NEC Express5800/1320Xd

  • 32 Intel Itanium II 1.5 GHz processors

  • 64 GB of RAM

  • 4 Emulex LP9002 PCI Host Bus Adapters

  • This configuration was split into two partitions, each with 16 processors and 32 GB of RAM. The consolidation testing utilized one partition.

Storage

NEC S-Series Storage Disk Array

  • 105 – 15K RPM 36 GB disks

  • 16 GB of cache

  • Configured with RAID 1+0

    Using NEC Storage PathManager 3.0 Enterprise for Windows, a multipathing software solution based upon Microsoft Multipath I/O (MPIO) technology, to balance I/O traffic across all four Host Bus Adapters

Fabric Switch

No switch (direct connect)

Operating System

The 64-bit version of Microsoft Windows Server 2003, Datacenter Edition

Database Server

Microsoft SQL Server 2000 (64-Bit Edition)