Export (0) Print
Expand All

Chapter 33 - The Data Tier: An Approach to Database Optimization

This chapter describes a basic approach to optimizing the data tier, a concept that is based on the distributed Web-based architecture for the .NET Enterprise Servers (which includes Microsoft® SQL Server™ 2000). The data tier encompasses the database server as a whole, a system comprised of several interrelated layers: the SQL code, the database design, the data storage components on the physical disk, and the server configuration. Given this broad definition, the data tier brings into play three activities that had previously be treated as separate concerns: design, performance tuning, and hardware sizing (capacity planning). In this chapter, you will learn how all three activities can be brought together into one approach. The particular aspects of optimizing the data tier that are described in this chapter are understanding the cycle of optimization, evaluating system performance, and making strategic hardware choices.

A New Approach

Cc917660.spacer(en-us,TechNet.10).gif Cc917660.spacer(en-us,TechNet.10).gif

Optimization of any system obeys the Pareto Principle, which states that only a vital few factors are responsible for producing most of the problems. This is also known as the 80/20 Rule, meaning that most of the problems in a system result from just a few causes. Specifically, the biggest optimizations will be effected at the business process end of a system, then the application code, the database schema, and physical data storage, and the server configuration.

In a business environment that does not yet have a unified approach to solving business problems by combining business process with technology, the system administrators would tend to apply optimization to the parts of the system that were within reach. In fact, prior to the architectural enhancements made in SQL Server version 7.0 and SQL Server 2000, this was the standard approach. The logical approach seemed to be building on the server as a foundation, then the database configuration parameters, and then working upward to the parts that were less easy to reach because they involve a larger group of people for the changes required. This approach was heavily weighted toward the server, as shown here:

Cc917660.cthtbl(en-us,TechNet.10).gif

However, applying the Pareto Principle reveals that this approach is not going to produce the best resolution to the problem, in part because SQL Server has become a more advanced product. As SQL Server has changed over time, the central focus of optimization has shifted toward application tuning. The database administrator (DBA) can focus on areas that will have more impact (database and application design), without needing to spend so much time on areas that have less effect long term (server hardware and configuration).

The following diagram illustrates the data tier elements, and an approach in optimization moving from left to right:

Cc917660.odofopn(en-us,TechNet.10).gif

Ideally, all four columns must be addressed, or the application will not be tuned optimally. In cases where tuning must be approached sequentially, then the order of effort becomes extremely significant. The elements on the left are the most difficult, but they also give the largest rewards in terms of performance. For example, while some gains in performance may be achieved by tuning the environmental settings (sp_configure), this approach will not reach the same level of performance that could be reached by tuning the code elements. Only large systems, in terms of concurrency or transaction volume, need to worry about elements three and four, the storage components and server components. The new approach is more focused on optimization as a cycle.

Optimization Cycle

Cc917660.spacer(en-us,TechNet.10).gif Cc917660.spacer(en-us,TechNet.10).gif

Optimizing the data tier is a process with defined steps. It is important to remember that a system changes over time due to gradual shifts in usage patterns, transaction load, application or optimization changes, and hardware upgrades. Optimization is a cycle. To stay on top of it, keep a schedule. Optimization or maintenance updates can be included in the regular release cycle, or can be implemented separately. It is better to have a scheduled plan, rather than waiting for a production issue to require one

The first step in optimizing your system is an obvious one, but bears mentioning: Look at what you have. Observe and make records of the following information in relation to the server, so you will have a very clear picture of your environment before you begin to make changes to it.

  • What is the formal service level agreement (SLA), if any, for this system (everything from hardware to software)? An SLA should include at a minimum the required uptime and usage hours of the system as a whole, the transaction load that must be supported, and the response time windows for emergency situations. Once you have located (or created) an SLA, observe if the current system performance falls within its bounds—you might save yourself wasted work by focusing on what the users believe to be important. 

  • What applications touch the data? There may be several using the same tables, or separate tables within the same database. What other things rely on the data? For example, is the data exported nightly to a data mart? Does some data get transferred to other application systems? Also, find out if this is a third-party application or custom-built. This will tell you how much you can change in the database. 

  • What is the basic design of the system and how is database interaction handled? Do they use ADO or OLE DB? Do they maintain open connections or get a connection each time (as from a Web page)? Do they directly access the tables to run pass-through queries? Do they use stored procedures for all or some of the queries? How is security controlled? 

  • Open the database you are going to be focusing on. Note how many of each object the database has.

  • How many and how large are the databases (and tables/indexes) on the server? How many users are there, and how many of them are online?

  • Check sp_configure for any discrepancies from the defaults. Make a note of each and the reasons for the change. For more information, see "Setting Configuration Options" in SQL Server Books Online.

  • Where are the data files located? Transaction log files? tempdb?

  • How many spindles (disks) do you have? How are they laid out, and what is the RAID level? If you have external storage, note the cache settings, and which channels are associated with which drives. 

  • Where are the results of running the sqldiag tool stored for easy access? A great deal of information about the server can be obtained by scheduling sqldiag on a regular basis. Save the output to compare server changes over time with other issues that arise on the system. The output from sqldiag is also usually required if you need to call in a support issue. For more information, see "sqldiag Utility" in SQL Server Books Online.

  • What other server software running along with SQL Server? How many instances of SQL Server are using the same server resources? 

  • What version, service pack, and hotfixes are you running, for both the operating system and SQL Server? 

  • Check the drivers for all of your hardware, to make sure that you are using the most recent version.

Compile this information for each of your applications in advance. Store it to a standard place across for every server. For example, you might make a standard directory (C:\SQLAdmin) on all servers and store all of the database administration report output there. This information could be collected onto one server by a later process, and imported into a SQL Server database for trend analysis. For ease of administration, make the data available on a Web site for the database administration team. The point is that the material should be easily accessible.

In a crisis situation, you may rely on this information to help you make a decision on what to fix. Do not try to troubleshoot as your record; just make notes for now. For items that may require involving other staff in other groups, make a note that you are awaiting the information and move on. Do not spend an excessive amount of time on this; the object of this stage is simply for you to get a clear idea of the bigger picture. Simply keep all this information in mind, and move on to the next step.

For information about tracking server and application information, search for "Configuration Management" and "Process Model for Operations" at http://www.microsoft.com/technet.

Evaluating the Situation

Cc917660.spacer(en-us,TechNet.10).gif Cc917660.spacer(en-us,TechNet.10).gif

In any situation, you must collect as much data as is feasible given your resources and the time you have. Once you have the data, analyze it for potential opportunities to correct problems or to optimize based on usage patterns. It is best to collect performance statistics over time, so you can analyze the system patterns, and make well-informed decisions on changes to be made. The pattern of statistics over time is called a baseline. The process of collecting baseline data can be started at any point, regardless of your current state of optimization (or lack thereof). The point is that the data becomes useful for analysis over time, meaning as aggregate data.

In addition to collecting system statistics, and configuration information, remember to talk to the users of the system. Often, they can easily identify which parts of the system are slow, and whether they are slow at certain times. This helps in two ways. First it helps you focus on problem areas more quickly, and it also helps you fix the problems that are having the most impact.

  • After you have collected information on all the parts of the data tier, you are in a better position to judge what settings to use in monitoring for optimization opportunities. Several performance monitoring tools are discussed later in this chapter. 

Unless your system is completely new, you should collect data on the production system first. Then, if possible, test potential changes on a test system. Create a script (either an SQL script, or a list of steps to perform) and use this to test the implementation process of your proposed changes on the test server. Using the same settings used to monitor production, monitor the same scenarios on the test server.

After confirming that your proposed changes look good, schedule the production implementation. It is important to note that even in a crisis situation, you should observe this step. In a crisis, your "implementation schedule" might simply consist of notifying your change control department and your help desk of the impending change and what issues you expect to occur during the change process (for example, downtime, or slow connections).

After implementation, you should promptly follow up by repeating the same monitoring processes you used to observe the problem. This will confirm (or refute) that your performance or problem-solving changes have had the desired effect, or that any changes made for the sake of administrative purposes have not caused a problem. Keep in mind during this confirmation period that, even with the best planning, you may find that another problem has arisen, either as a result of your changes, or because a secondary underlying issue is now made clearer in the absence of the first issue. Optimization is an ongoing process.

Performance Monitoring Tools

The following tools can help in locating problems during production hours:

  • SQL Profiler 

  • System Stored Procedures (sp_lock, sp_who2, and so on) 

  • System Monitor in Microsoft Windows® 2000 (known also as Performance Monitor) 

The primary benefit of using these tools to monitor performance is to learn from the data you collect over time. Therefore, it is essential to import the data into SQL tables, create aggregate tables to focus on specific details, and then analyze the information for patterns. One good way to do this is to use SQL Server Analysis Services. When you are able to see performance on a larger scale, then you will be able to identify which parts of the system you can apply the Pareto Principle to. In other words, you will easily be able to identify where you can make changes that will have the largest effect on the system.

SQL Profiler

SQL Profiler is useful for observing events in a SQL Server environment. It can be scoped to many levels of detail, ranging from every transaction in an instance all the way down to a specific event for a specific session. Use SQL Profiler to:

  • Monitor the performance of an instance of SQL Server. 

  • Identify slow-executing queries. 

  • Test and debug SQL statements and stored procedures in the development phase of a project by stepping through statements to confirm that the code works as expected. 

  • Troubleshoot problems in SQL Server by capturing events on a production system and replaying them on a test system.

  • Replay manually created SQL test scripts. 

  • Audit and review activity that occurred on an instance of SQL Server.

When you set up a new session in SQL Profiler, you can specify what events to capture and to what level of detail. You can also specify the set of circumstances in which to capture the events, and you can choose to capture event data to either a file or a table for later analysis. A stored trace can be replayed against an instance of SQL Server, effectively re-executing the saved events as they occurred originally. You can also create your own SQL scripts and use SQL Profiler to replay these. In a testing scenario, the replay option is the real value of SQL Profiler. For information specific to creating traces to be replayed, see "Replaying Traces" in SQL Server Books Online.

A very useful feature of SQL Profiler is the ability to create templates. These can be created and reused for specific monitoring situations. SQL Profiler comes with a set of common templates that you can use to make this process easier, especially if you are new to the tool. This way you can develop particular views of your system that will help you track specific performance issues.

To run a template against your system, open SQL Profiler, point to the File menu, select New, and then click Trace. A connection menu will appear so you can connect to the appropriate instance of SQL Server. In the next screen, you can select the template to use, and whether to save the trace to a file or a table. It is very important to record information traced for future analysis. The best way to do this is to store the trace to a file (but not on a drive used by SQL Server), and then copy the data from the file into a table for analysis.

System Stored Procedures

Monitoring can be performed through the use of system stored procedures and functions as well:

Command

Description

sp_who and sp_who2

The sp_who procedure will provide information such as system process id, status, associated login, host name, blocking processes, database name, and command type. The other (undocumented) version of it, sp_who2, provides the same information, plus: usage of CPU, Disk and Memory, and also the calling Program Name for that spid.

sp_lock and OBJECT_NAME()

Use these to help track down issues regarding improper data access design, or to find ways to balance and optimize locking required by business needs versus isolation level settings. These are especially useful in tracking down blocking issues. You can also use Trace Flag 1204, which returns the type of locks participating in the deadlock and the current command affected.

fn_virtualfilestats(db_id, file_id)

Returns I/O statistics for database files, including log files.

fn_trace_geteventinfo(trace_id)

Returns information about the events traced.

fn_trace_getinfo(trace_id)

Returns information about a specified trace or existing traces.

sp_trace_create

Creates a trace definition. The new trace will be in a stopped state.

sp_trace_generateevent

Creates a user-defined event.

sp_trace_setevent

Adds or removes an event or event column to a trace.

sp_trace_setstatus

Modifies the current state of the specified trace: Start, Stop, or Close.

fn_trace_getfilterinfo(trace_id)

Returns information about the filters applied to a specified trace.

sp_trace_setfilter

Applies a filter to a trace.

fn_trace_gettable(filename, number_of_files)

Returns trace file information in a table format.

For more information, see "Monitoring with SQL Server Enterprise Manager" in SQL Server Books Online.

Also, while the SQL Profiler interface is most commonly used, some system procedures correlate to the SQL Profiler actions and allow you to control profiling from within SQL Server, such as from within a procedure or job. It is worth noting that these have all changed in SQL Server 2000.

For simplicity, you can use the Profiler interface to set up a trace, and then script the trace settings to SQL Server (on the File menu, click Script). A good use for a scripted trace is to coordinate simultaneous monitoring with different tools. Another practical use is to capture events occurring at a specific time, so you do not have to be there to do it manually. By running the scripted trace from a scheduled job, you can make this the first step in a process that will also import the trace data and transform it into a set of report tables, so it is ready for your viewing. For more information, see "Creating and Managing Traces and Templates" in SQL Server Books Online.

System Monitor

System Monitor (formerly called Performance Monitor in Windows NT® 4.0) provides detailed data about the resources used by specific components of the operating system and by server programs that have been designed to collect performance data. You can use the data you gather to:

  • Understand your workload and the corresponding effect on system resources.

  • Observe changes and trends in workloads and resource usage so you can plan for future upgrades.

  • Test configuration changes or other tuning efforts by monitoring the results.

  • Diagnose problems and target components or processes for optimization.

System monitor data is collected as counter logs, which store data collected for the specified counters at regular intervals, or in trace logs, which record data when certain activities such as a disk I/O operation or a page fault occur. A parsing tool is required to interpret the trace log output. Developers can create such a tool using APIs provided on http://msdn2.microsoft.com/default.aspx.

The performance data is typically displayed in a graph form, whether you are running the monitoring tool "live" or whether you are viewing previously saved counter logs. You can also create alerts to notify you when a counter value reaches, rises above, or falls below a defined threshold. Some sample templates to help you isolate specific types of problems, and also to give you some ideas for ways to use this tool, are available on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement.

One feature of System Monitor is that it is a Microsoft ActiveX® object. This means that you can easily incorporate a view of a log into a document or a Web page. This can be useful for both issue and trend analysis. It would also be useful to centralize administration if you have a team Web site.

You can use SQL Profiler and System Monitor simultaneously on the same system to gather more detailed time-related information on the interaction between hardware and software. For more information about System Monitor counters and interpreting them, see the System Monitor online help. Also consult the Microsoft Developers Network (MSDN) at http://msdn2.microsoft.com/default.aspx.

Staging a Test

You can use these tools against a production system to monitor real-time performance, or you can stage a test of the system on a test server. To do this, you should ideally have a server available which closely matches the production configuration. Make notes on the differences between the systems, as these differences may affect the level of accuracy in the test results. For example, if your production system has an external array of disks, and your test server does not, you will not be able to get a completely accurate picture of disk usage on the test server. On an application level, however, you can gather a lot of very useful information on the performance of a system.

On a basic level, you can test a small set of queries for performance comparisons to the production system. In this case, you would simply turn on SQL Profiler (selecting the appropriate events or using a template), and run the queries in SQL Query Analyzer (with "Show Execution Plan") to get a basic feel for how it is working. However, for a large test you will need to analyze usage patterns on your production server, or create simulations based on the test environment, and then create scripts that can be replayed to simulate many users working concurrently.

Two levels to testing a system are interrelated: the application and the data tier. Testing either one alone will not provide a complete picture of the scalability or performance of a system. Testing the front end to learn what transaction load it can support involves creating a recording (or script) of actions performed on the front end or Web page. These are replayed by testing software, and provide feedback on how much of a load the application can support. This type of testing, performed for the purpose of scalability and

reliability, is known as stress testing, and is an important part of ensuring the success of a system prior to production implementation. Many full-featured third party tools are available to test your application. Microsoft provides a free tool for basic Web testing: the Web Application Stress Tool. For more information, see http://msdn2.microsoft.com/library/ and search for "Web application stress tool."

Some data tier information can be gathered during these tests, but you will achieve more thorough results if you test the database as well. Be aware that the way the application works will necessarily have an effect on database performance. For example, if your application connects to the database from Internet Information Services (IIS), and it has data caching turned on, then some queries which look like "frequent use" queries may be used only once: when the cache is refreshed. This might be the case for a re-pricing function for all "sale items" on an e-commerce site. If this function dynamically prices each item every time the Web page is called, this looks like it would be a huge problem. If data caching is turned on, however, then this is no longer an issue because the pricing function will only run when the cache is refreshed (a time increment you should be aware of). This function would still be on your list of performance issues, but the usage of it would lower the priority to the bottom of your list.

The contents of the cache during a test have a significant impact on the results. A cold cache is when nothing is in memory, and all data is read from the disk; a warm cache is when the system has been running and most data and stored procedures are loaded and compiled (for example, all the lookup tables); and a hot cache is when the entire data set required is in memory.

A performance test that does not take caching into account can lack repeatability, and therefore have less value. It is best to clear the cache prior to starting each test by stopping and restarting SQL services or by running DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE. Or, you can start recording the test results after a suitable warm-up period, or design some hot cache code to get things loaded into cache. To do this, simply make a script that runs the necessary stored procedures, DBCC PINTABLE, or queries tables in such a way that you would get the required table or index scans.

For information about testing, search for "Testing Process," at http://msdn2.microsoft.com/library/.

Monitoring and Optimizing

Cc917660.spacer(en-us,TechNet.10).gif Cc917660.spacer(en-us,TechNet.10).gif

The previous section covered the basics regarding the tools and the general process; this section provides examples of how you would use SQL Profiler, system stored procedures, and System Monitor to approach a specific problem: slowness of response time. Lack of speed is one of the most common complaints about applications, particularly when that application has outgrown its original design, or was inefficiently designed from the start. A close examination of this problem will reveal more about how proper design and configuration can free the server to scale to whatever level you require. Because a full explanation of all possible scenarios is outside the scope of this chapter, the examples will be limited to the following scenario.

Sample Scenario: Suppose you have accepted a new job managing the database systems of a mid-sized company. They are growing, and now need someone to optimize their flow of data. Clearly, you need to define and document the servers you are now responsible for. In the course of exploring the data environment, you learn from the company help desk that two of the systems are getting complaints of slow response time.

In addition to showing the steps to follow for our example, these general guidelines can be applied in any testing, tuning, or troubleshooting situation.

Monitoring a System

Before starting an optimization project, you should have already gathered as much information about the system as possible, paying special attention to how the application works and how it is used.

In terms of the previous scenario,assume that this is a custom built application, and that the two applications running on the database are the order system of the company and a trend analysis reporting application. These two systems access data in very different ways, and to be fully optimized, the systems will be different from the hardware level up. Assume that due to a business need, the design of these systems was less than optimal. The reporting application is made of an interface that queries a set of report and parameter definition tables, and then runs stored procedures or embedded SQL based on user input. In addition, reports are run during peak hours. If reports were instead run during off-peak hours, or once every week or month, performance issues regarding reporting would be improved.

Knowing how the application is used, and what is important to the users at what times, should help you understand what should be improved first, and may also give you some clues on what issues you might expect to find in the data tier. The application information will save you a lot of time in the long run, and focus your tuning efforts to make them more effective.

For the purposes of the example, looking at the overall problem and the application information, you choose to go through some basic steps, and then focus in on the possibility of problems with the indexes and the stored procedure code.

Address Operating System and SQL Server Errors

Before beginning work on optimization, you should make sure you are starting with a clean server. Check the Event Viewer system and application logs on the server. Do not check just SQL Server messages—skim through all of the messages. Then, check the SQL Server error logs.

For example, if you were investigating slow or failed connections to the server, and you open Event Viewer to find an error indicating that the SQL Server cannot reach the primary domain controller, then you should have that error resolved first. Before proceeding with any further investigation, verify the status with the users reporting the original problem. This way you do not spend time on a problem that was not database related.

For more information about how to troubleshoot errors, see SQL Server Books Online. For more information about a specific problem, search the Knowledge Base at http://search.support.microsoft.com/kb/c.asp?fr=0&SD=GN&LN=EN-US, or see the SQL Server online newsgroup (microsoft.public.sqlserver.server) at http://www.microsoft.com/sql/community/newsgroups/default.asp.

Monitor to Identify Areas for Improvement

When you are not sure what you are looking for, it is best to use several tools on some general settings in order to capture a broader range of data.

Depending on the situation, you may choose to monitor the server locally or remotely (meaning that you are either running the tools on the same server that you are monitoring, or you are running them from a different server). Here are some things to keep in mind:

  • Local monitoring can exploit a shared memory connection and avoid network I/O for large amounts of trace information. Choose local if you are regularly monitoring most of your servers, simultaneously or with a large number of counters.

  • Remote monitoring may be best if you have eliminated network problems from your list of potential suspects, and if you are monitoring a few computers at a time. Ensure that your network can easily support remote monitoring, and that latency is low so you get accurate results.

When setting up System Monitor, it is a good idea set the sample interval to between one and three minutes. The default value is every one second, which has an impact on the server under observation. Setting the value too low generates a lot of data without providing much more valuable information. A larger interval also allows for a more steady reading; and is better for initial analysis. Keep in mind that the more counters you monitor, the more overhead you will incur, the larger a saved log file will become, and the more data you will have to import and analyze later on.

You should establish a baseline set of counters that you monitor regularly, and this may be a separate process from optimization monitoring, depending on your baseline/optimization goals. To run a baseline, run the System Monitor for a brief period on a regular basis, so you can report on trends over time. Consider importing the counter log data into a SQL Server table, so you can use either stored procedure reports or Analysis Services to show trends. Here is a good sample set of counters you could use for baseline monitoring:

\Memory\Available MBytes
\Memory\Pages/sec
\Memory\Cache Faults/sec
\Memory\Page Faults/sec
\Processor(_Total)\% Processor Time
\Processor(_Total)\Interrupts/sec
\Process(sqlmangr)\% Processor Time
\Process(sqlservr)\% Processor Time
\System\Processor Queue Length
\System\Context Switches/sec
\Processor(_Total)\% Privileged Time
\Processor(_Total)\% User Time
\PhysicalDisk(_Total)\% Disk Time
\PhysicalDisk(_Total)\% Idle Time
\PhysicalDisk(_Total)\Avg. Disk Queue Length
\PhysicalDisk(_Total)\Current Disk Queue Length
\PhysicalDisk(_Total)\Disk Reads/sec
\PhysicalDisk(_Total)\Disk Writes/sec
\PhysicalDisk(_Total)\Avg. Disk sec/Read
\PhysicalDisk(_Total)\Avg. Disk sec/Write
\SQLServer:Buffer Manager\Buffer cache hit ratio
\SQLServer:Databases(_Total)\Transactions/sec
\SQLServer:General Statistics\User Connections
\SQLServer:Locks(_Total)\Lock Requests/sec
\SQLServer:Locks(_Total)\Lock Timeouts/sec
\SQLServer:Locks(_Total)\Number of Deadlocks/sec

There is a sample "Baseline.htm" located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement. And for some useful tips on monitoring, see "Setting up a monitoring configuration" in System Monitor online help.

Wherever feasible, you should first record a trace and play it back on a similar server (such as the standby server or a test server). By doing this you can search for errors using all available counters, with no impact on production. Running on a test server, you can also experiment with changes more easily. If you use a standby server for this, make sure you do not change anything that might disrupt any disaster recovery plans. Obviously you will only be able to get some measurements on the production system. However, as a standard practice, always minimize the impact to production.

If you know exactly what type of problem you have, you can narrow down to just the monitoring that applies to that issue. By running several tools in coordination, you get a time slice of the data that reveals a total picture of all activity related to your research. When using multiple tools, it is vital to run them simultaneously if you want a clear picture of everything happening on the server. All of the SQL Server tools discussed in this chapter can be scheduled and run from a job for the same period of time.

  • System stored procedures, such as those for current activity and locking, can be scheduled like any other procedure, and the output redirected to a file and later imported to a table for ease of analysis. 

  • SQL Profiler: If you wish to schedule a trace, you need to use the trace-related set of system stored procedures (such as sp_trace_setstatus), and schedule them as SQL Server Agent jobs. To make things easier, you can set up a trace in SQL Profiler; then go to the File menu and click Script Trace to create Transact-SQL for your trace. For more information, see "Creating and Managing Traces and Templates" in SQL Server Books Online. Also, scripts of all the default SQL Profiler traces are located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement. 

  • System Monitor: To schedule a counter log from a SQL Server job, simply create an alert on the SQL Server:User Settable(User Counter 1)\Query counter ( 10 of these are available), and have it respond to a counter value above 0. Then, in SQL Server, create a job with the following code:

    EXEC sp_user_counter1 1
    

    Note Your job should later set the value back to 0, to avoid setting off the alert accidentally. 

    Setting the counter to 1 will set off the counter log for the defined interval. An example of how to set this up is located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement.

If you are trying to capture an elusive error condition, try to identify something that happens when the error occurs, or just before it, and use that event to generate an alert that starts the monitoring process job. Also, consider tracing a single user who is experiencing a particular problem, rather than profiling everything on the system. This generates a cleaner representation of the data for analysis.

Monitoring SQL Server in General

When you approach a database system for the purpose of optimization, or in fact for capacity planning or trend analysis, you want to assess system performance as a whole. The purpose of this is twofold. First, you want to get a baseline on how the server is performing. This helps you understand what is normal for your system; it also helps you understand developing trends, which can be an important part of being prepared for the future. Secondly, the overall view of the system will allow you to put the symptoms you have observed (slowness, for example), together with the behavior you can observe through the monitoring tools.

Returning to the scenario mentioned previously, assume that you set up your monitoring tools on your own desktop, and schedule system stored procedures on the server, so you will get a complete time slice.

System Monitor will not provide all the answers you need, simply because not everything can usefully be reported in the form of a counter. You should also examine your system using SQL Profiler or a trace running from a system stored procedure.

You may choose to monitor for each focus area in a separate trace file, where the information will be already grouped in a meaningful fashion (depending on how you set it up). Or, you may choose to monitor in general, store the trace to an SQL table, and do your own reporting from there. You should set standards for monitoring a database server. Store all your templates and scripts in a central repository somewhere so they are backed up, protected, and readily available to your group. This gets especially important the larger your team is and the more database servers you have.

Once you have a clear general idea of what is happening on the server, you can begin to focus on the issues that warrant your attention.

In the case of a complaint of slow response time against one of the major systems of your new company, you already know that you can have a much larger impact on the server by improving the application and the database design. Because you are tasked with improving performance without changing the front-end application, you focus immediately on database design.

You choose to monitor for slow queries, blocking issues, and indexing issues, figuring that if you turn up anything these give you a good start on optimizing your system. Because you want everything to run simultaneously, you schedule all of your tools. Because the problem is so urgent that your management has authorized daytime monitoring, you choose a time of moderate activity (based on input from the network administrators and user contacts). You decide to run a full half-hour test this time, although normally you would elect to run two 15-minute periods within an hour.

To monitor for database or application design issues, you decide to set up System Monitor with the following counters:

\SQLServer:Access Methods\Full Scans/sec
\SQLServer:Access Methods\Index Searches/sec
\SQLServer:SQL Statistics\SQL Re-Compilations/sec
\SQLServer:Locks(_Total)\Average Wait Time (ms)
\SQLServer:Locks(_Total)\Lock Waits/sec
\SQLServer:Locks(_Total)\Number of Deadlocks/sec
\SQLServer:Cache Manager(Prepared Sql Plans)\Cache Hit Ratio

Then, you set up a job to run sp_who2 and sp_lock periodically throughout the monitoring period, saving the output to a file. These two procedures can be customized to provide information in a format you find most useful.

Now run two traces, one for the Duration template and one for the Replay template. The Duration template generates a list of Transact-SQL statements, and the duration of time they took to complete. The Replay template could be used for the Index Tuning Wizard.

Analyzing the Results: Database and Code Level

When you are starting this process, a seemingly endless number of things could be wrong, and you have many different ways to observe these problems. The key is to choose valid monitoring parameters that you understand well, and then to focus your priorities on the areas where you can have the greatest impact.

First, look at the data and code levels, knowing that this is the best way to get a performance increase. Technically, the biggest gain is to be found in optimizing the application and the code it rests on.It is also worthwhile to examine the indexing, however. Very often, indexing originally done for the application may not have been sufficiently tested; in addition, the usage of the application usually shifts over time, especially as enhancements are introduced and the volume of users and transactions increases.

You may find that your initial monitoring efforts uncover a number of problems to which you can turn your attention. However, to be most effective, it is best to choose a few and concentrate on those, saving the other discovered issues for the next optimization cycle. The idea here is that every time you do any tuning, the system is altered, so you want to get a new monitoring report before making a decision on what the next priority should be.

Blocking Based on Database Design

Looking in SQL Profiler at the trace file you created through sp_trace_create, you convert the Duration value from milliseconds to seconds, and discover that you have several queries that run for over one minute. One of them appears many times on your list; the other appears only once, but it takes far longer. You should select the most frequently used, longest-running queries for your first optimization pass on the code. This will have the greatest impact on the system. The other query, although it takes longer to run, is used very infrequently, and should be left as the lowest optimization priority.

This is an important distinction to make. The user who is running the very slow query may very well call in to complain of the slowness. Meanwhile, all of the users may be suffering because of general system slowness caused by the queries that run very frequently (such as those which populate the main screens). The important distinction to make here is that the complaint of general slowness is very difficult to handle from a help desk perspective. The single user with a complaint on an easily identifiable report is much easier to address.

Since you already suspect locking in the sample scenario, you look at the data you collected with sp_who2 and sp_lock. Sure enough, the procedure you chose from the trace is listed often in sp_lock, and the same SPIDs are also blocked processes listed in the BlkBy column of sp_who2.

Note Trace Flag 1204 can be used to get more information on the type of locks participating in the deadlock and the current command affected.

Studying sp_lock, you see the following results as representative of the pattern that is common for many of the spids listed:

spid

dbid

ObjId

IndId

Type

Resource

Mode

Status

76

20

usp_productbytype

0

TAB

 

Sch-S

GRANT

76

20

usp_priceit

0

TAB

[COMPILE]

X

GRANT

76

20

usp_priceit _

0

TAB

 

Sch-S

GRANT

76

20

usp_productpromote

0

TAB

 

Sch-S

GRANT

76

20

usp_priceit

0

TAB

[COMPILE]

X

WAIT

76

20

usp_priceit

0

TAB

 

Sch-S

GRANT

77

20

usp_productpromote

0

TAB

 

Sch-S

GRANT

77

20

usp_priceit _

0

TAB

 

Sch-S

GRANT

77

20

usp_priceit

0

TAB

[COMPILE]

X

GRANT

Comparing this to the data from sp_who2, you focus in on the spids 76 and 77. (Note that this sample is missing a few columns for purposes of display.)

SPID

HostName

BlkBy

Command

CPUTime

DiskIO

62

WRKSTN07

64

EXECUTE

25797

273

63

WRKSTN08

.

AWAITING COMMAND

15531

310

64

WRKSTN08

89

EXECUTE

2563

324

65

WRKSTN08

64

INSERT

8344

461

69

WRKSTN07

64

EXECUTE

17188

531

70

WRKSTN08

64

EXECUTE

103281

141

71

WRKSTN07

64

INSERT

13438

76

72

WRKSTN08

.

AWAITING COMMAND

46469

6

73

WRKSTN08

.

AWAITING COMMAND

14281

270

74

WRKSTN08

64

EXECUTE

13109

305

75

WRKSTN07

.

AWAITING COMMAND

25375

232

76

WRKSTN07

64

EXECUTE

68094

367

77

WRKSTN07

64

INSERT

14328

419

Comparing this to information gathered in System Monitor, you see:

Counter

Condition

\SQLServer:SQL Statistics\SQL Re-Compilations/sec

This is more than 1,000. You make a note to investigate other stored procedure code and look for embedded or ad hoc SQL queries, once the current optimization effort is complete.

\SQLServer:Locks(_Total)\Average Wait Time (ms)

The average wait time is substantially more than 30 seconds.

\SQLServer:Locks(_Total)\Lock Waits/sec

This number is high, but this is consistent with the blocking.

\SQLServer:Locks(_Total)\Number of Deadlocks/sec

This is very low in proportion to the other lock counters. Fortunately, although a blocking issue exists, it looks as though the tables are processed in a consistent sequence.

Open the code and trace through the logic. For this example, assume that you find a stored procedure, which creates a temporary table of items to price, populates it, and then executes a nested stored procedure to individually update the price of each item in the temporary table. This type of design would cause numerous recompiles, which would in turn cause the blocking issue.

The first step in fixing this is asking what went into choosing the design. As a DBA, your first instinct would be to get rid of the temporary table and consolidate the code into one procedure. However, you must understand the reason behind the design, the system requirements, before you begin. For the example,assume you learn the following: 1) The nested procedure was created to simplify maintenance, since the same pricing routine is used in numerous other procedures; and 2) the pricing is done dynamically because prices are changed during the day.

Both of these are valid reasons for the design. But there is often more than one way to provide a solution, and in this case you can meet the requirements and still provide some optimization. In the end, all of performance tuning is an act of balancing needs: Speed, ease of administration, and cost are all factors that must be balanced, meaning that if you go for an extreme in one area, you must be willing to make some sacrifices in another.

In this case, to fix the problem, you could look for a way to eliminate the nested procedure, and price the items without the temporary table. This would mean that the same pricing routine must be incorporated into all the calling procedures. It simplifies the procedure, and eliminates blocking issues. How do you keep the goal of simplifying code maintenance? With documentation. Put a block of comments in every procedure to explain that the code fragment is scattered throughout the database. Use a code scanning procedure for this. An example of sp_searchcode is available on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement.

Another alternative is to look for a way to avoid pricing the items dynamically. Perhaps the price column could be added to the product table, and any re-pricing would simply include a process that updated this column. This design would offer far better performance, especially given the fact that the process that should have priority is the one that queries the data repeatedly. Even if the items are re-priced frequently during the day, the frequency of sessions doing pricing will never reach the frequency of sessions searching for items will. So, you add speed by denormalizing in the area of high reads. In this case you add speed by adding the extended prices to each item, rather than calculating the various extended or discounted prices. You keep the functionality for changing price, but the process of changing the price will be a bit slower because now it will include updating the product table as well. If that slowness is also an issue, you could consider queuing the requests to change prices. Once the real business priorities become clear (in this case that the customer should not wait), then you can more accurately choose a design which fulfills your needs.

For more information on blocking issues, see "Understanding and Avoiding Blocking" in SQL Server Books Online.

Slowness Due to Indexing Schema

Sometimes a problem has multiple causes. In this case, a lack of proper indexes could possibly cause blocking issues. Indexes help the database select data for the result sets; no indexes are used by the query, a user query must wait if someone has a row locked on a table where a full scan is initiated.

You should give regular attention to your indexing schema. In the area of database maintenance, poor indexing is the most common cause of slow queries, and it is also the easiest to fix. Correcting the indexes for a query can have a near-magical effect on increasing query speed.

The most obvious symptom that indexes are in the wrong place is slow response to frequently used queries. Other things, of course, can cause this, but the lack of proper indexes should be a prime suspect. On the contrary, if you are getting slow response on inserts and updates, take a look at how many and how wide (in columns) the indexes are on the target table. Too many indexes can slow data changes. For this reason, you will also want to eliminate indexes that are never or rarely used by queries, or which are only used for minor benefit to queries that are infrequently run.

Note If you have a system where you would like to index most of the columns, because you feel they will all be used, you should use the auto create statistics and auto update statistics database options. The query processor can use this statistical information to determine the optimal strategy for evaluating a query. For more information, see "Statistical Information" in SQL Server Books Online.

For the sample scenario, assume that you have the following counter values, indicating that you should check the index placement:

Counter

Condition

\SQLServer:Access Methods\Full Scans/sec

This database has many full scans; a rate of 3 is recorded as the average.

\SQLServer:Access Methods\Index Searches/sec

There are fewer of these, but you may suspect that the blocking issue may artificially lower this.

How do you determine the proper index placement? The Indexing Tuning Wizard included with SQL Server can give you immediate feedback on your indexing schema, based on some fairly complex analysis of the data volume, distribution, and usage. This tool, which can also be opened from the command line with itwiz, should be a regular part of the optimization cycle, run on a standby server with a trace from the production server.

As your system changes over time, you should run the Index Tuning Wizard to evaluate how current usage has changed the best indexing schema. This can be run on one query, in Query Analyzer, or on a trace script (use SQL Profiler template SQLProfilerTSQL_Replay.tdf, provided in Transact-SQL form on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement).

For some complex types of queries or design requirements, you may find that you still want to do some indexing analysis without the tool. For example, if you have a distributed process that you want to control, or if you have some queries which for some reason must be the most optimal even if they will not appear so in a trace. If you have cases like this, run the wizard to get feedback so you have additional automated recommendations. For information about indexes and their options, see "Create Index" and "Index Tuning Wizard" in SQL Server Books Online, and the MSDN white paper "Index Tuning Wizard for Microsoft SQL Server 2000" at http://msdn2.microsoft.com/library/.

Data Storage Component Issues

It is also important to address data storage issues, including both the data file layout and the hardware configuration. This section will focus on problems that can be caused by inappropriate disk layout. For information on filegroup layouts, see "Optimizing the Storage Components" in this chapter.

If you examine the server and discover that you have poor throughput even on a reasonably large server, you should look at some System Monitor counters to get more information. For this example, we have already found several issues that require fixing. At the server layer,assume that the sp_configure settings on the server were default settings, and therefore should not be examined first. Instead, we should check the hardware to make sure it is being used optimally before taking any action on the server configuration.

Assume that you go up our example server, armed with all the information you have gathered, and you run the some counters related to page faults and observe the following.

Counter

Condition

\Memory\Pages/sec

A large amount of hard paging is taking place. The average value is more than 70.

\Memory\Page Faults/sec

A large amount of soft paging is indicated; the average value is more than 250.

\SQLServer:Cache Manager()\Cache Hit Ratio

The average amount is 65 for Prepared SQL Plans, Procedure Plans, and Adhoc SQL Plans.

The Performance Monitor log shows that the page faults per second is very high. In addition, the SQL Server cache hit ratio is proportionally very low. This indicates a need for more memory. Memory\Pages/sec should be as close to 0 as possible; the higher the value, the more hard paging is taking place. Hard paging is when the operating system goes to disk to resolve memory references. The operating system will incur some paging, because this is how it handles memory. But when excessive paging takes places, this process uses disk I/O and CPU resources, which can intensify slowness in the database. Because soft faults consume CPU resources, soft faults are generally not as bad on performance as hard faults. Hard faults consume disk I/O resources. The combination of these can intensify a variety of other performance problems.

To be thorough, you should go ahead and look at all the server resources on a general level, to get a better picture for what is going on.

Counter

Condition

\Processor(_Total)\% Processor Time

The processors are being well utilized, but there are frequent spikes over 80 percent.

In this case, you can see a significant amount of processor utilization. Spikes appear above 80 percent, so an extra processor might be in order. The application should be tuned first, however, and the paging problem investigated as well.

Counter

Condition

\PhysicalDisk(_Total)\Current Disk Queue Length

A significant amount of queuing is taking place; the amount frequently spikes to a value of 21.

In this example, the amount of disk queuing is a bit high for the system. Five physical disks are configured into a RAID 5 array, so with a total of four disks utilized, queuing should never be higher than 8. Here it spikes to 21.

You have several choices for looking at disk queuing in System Monitor. Logical disk counters are associated with the logical drive letters assigned by Disk Administrator, whereas physical disk counters are associated with what Disk Administrator sees as a single physical disk device. What looks like a single physical device to Disk Administrator may be either a single hard disk drive, or a RAID array, which consists of several hard disk drives.

The difference between the physical and logical disk layout is important. Understanding exactly which disks you mean when you say "Drive E" is absolutely crucial to being effective in monitoring the storage components of a database server. Equally important is understanding which database-related files are on which drive. In the case of the example, the operating system is on the C drive, the data is on the D drive, and the log is on the E drive. However, each of these three are logical drives configured on one physical disk array (also called a partition). So, while these three drives (C, D, and E) appear to be separate, the fact is that all of the files—paging file, data files, and log files—are all on the same physical disk.

A good practice is to always use the physical disk counters. These are already on by default on Windows 2000 systems. By relying on the physical counters only, you also save some driver overhead by not collecting the logical disk statistics. Plus, using the physical disk counters will help make it clearer how the drives are physically laid out.

In this real world example, many different problems are found in the same system, and they combine and make each other worse. In this case, the disk queuing is likely to be worse because frequent writes are made to the order system (random I/O) in addition to reporting queries (sequential I/O). Plus, this system

seems to need more memory, so data is flowing through the cache before it can be reused. However, you have already identified coding problems and indexing problems that could also be aggravating hardware utilization problems. It is far more efficient to optimize the application first, and then adjust the hardware. If you always do it the other way around, you will soon find yourself in the same bind, because the design problems will continue to grow in magnitude as the system grows in volume.

For our example server, you can see that you might get some gain from reconfiguring the system in RAID 0+1, so you reduce some of the I/O. If the amount of memory on this server were less than 512 MB, then you might also look at purchasing memory as well. Other than these minor adjustments to the storage component level, you are much better served by turning your attention immediately back to the design issues.

For information on identifying hardware problems, see "Identifying Bottlenecks" in SQL Server Books Online.

Other Issues for Optimization

Regarding the example scenario, it would be a good idea to move the reporting application onto a different server. Decision Support Systems (DSS) and Online Transaction Processing (OLTP) systems make dissimilar use of system resources such as CPU, memory, and disk. A DSS system generally accesses data by retrieving large numbers of rows for the purpose of analyzing the data. This involves sequential I/O. An OLTP system, on the other hand, usually hits one or just a few rows from a single table, a random I/O process. The two types of I/O have different and sometimes opposing optimization goals and tactics. They do not belong on the same disk. Some reads are natural to OLTP. Very few systems are pure data entry; many of them involve selecting items from a list, or viewing minor reports (not DSS scale). In a larger (higher volume) system, you might consider separating out the data into different filegroups on different disks by access pattern. However, if the DSS system and the OLTP system are both heavily used during the same hours, the strain on the resources will necessitate separating the systems onto different resources; the easiest way to do this is to put them on different servers.

Moving the reporting system off to another server to reduce the strain of reporting demands on the ordering system would be easy to justify, since looking at how well the business is doing should not have an impact on how well it does. You could estimate the performance gains by doing a trace for Replay on the OLTP application, and running that script against a test server of similar hardware configuration. The performance numbers you could get by monitoring this test environment (one without reporting interference) would tell you whether your plan would be worthwhile. System Monitor graphs could be included in a brief report, along with an estimate for the tasks involved in making the change.

Looking for table scans in the monitoring tools is a good idea, and it will often lead you to problems that can be corrected by appropriate indexing. However, you should also keep in mind that the query processor chooses the best plan possible. If you have a very small table of very few rows, reading the whole table is going to be faster than using an index; in fact, putting an index on such a small table would be detrimental. Also, if you are going to be reading a large number of rows from the table, or if the index column only has a couple of unique values, a table scan will be more efficient. It is important to understand how SQL Server works behind the scenes, if your optimization efforts are to be effective.

Note If you do not feel that you have sufficient evidence to support a change you think needs to be made, or if you are having difficulty isolating a problem, consider returning to your general monitoring, but with all counters being recorded. Be very careful of the performance considerations of running such a large number of counters, especially if you choose a short update interval. This is sometimes the only way to find a relevant pattern in the performance data, however.

Finally, remember that the database server does not exist in isolation, and can be subject to problems that are not SQL Server related. A good example of this is network issues. For more information on network issues that relate to SQL Server, see the chapter, "Configuring Microsoft SQL Server on the Network" in Microsoft SQL Server 2000 Administrator's Companion. Also see "Monitoring Network Activity" in System Monitor online help.

Managing the Changes

It is important to be careful in fixing the issues that appear to be the cause of performance degradation. Fix one thing at a time and observe the system again, so you will know the effect of your change. This is useful for documentation purposes if your fix works. If, however, your change causes a larger or a different problem, it would be better to know exactly what you changed. That way, you could remove the specific change, rather than having to remove a large set of changes to see what caused the new problem. Another advantage is that you could make small changes more frequently, and thus see a continuous gain in performance. This is the goal of the optimization cycle described in this chapter.

Optimizing the Data Components

Logical modeling often does not translate into a physical model, because the physical model must also be influenced by the usage of the set of tables. You must find a balance between normalized and denormalized so the database structure provides the required functionality, while meeting requirements of performance and scalability.

Stored procedures and functions assist in a single implementation of logic across your application—they are designed and coded once—and provide a layer of abstraction which allows you to better manage the underlying data. For added administrative flexibility, you can build your stored procedures on top of views of the tables. This is especially useful in any system which is likely to grow or change over time, or where you would like to have control over the physical database structure for future optimization or enhancement opportunities.

The needs at the structural level will almost always change over time. The more levels of abstraction (views, procedures) you have between the front-end and the physical tables, the more flexibility you will have in adapting the physical design to meet performance goals. The ability to manage the flow of data through the physical structures is crucial to adaptive administration of the database, and should be a design priority on larger systems.

Probably the greatest benefit of using views from a performance perspective is that it allows you to alter the underlying table structure without altering the front-end application. Using simple views provides you with the option to do that if you wish, so you can migrate your system from one design to another with minimal impact. However, your ultimate goal should still be to have a good table layout, as you cannot fix a bad one exclusively by using views.

The usage of the system will change the indexing schema requirements as well. It is a good idea to develop a plan for your indexes, including fillfactor settings, policies on placement of indexes, and a standardization of names. Here are some tips on good indexing:

  • Try to use narrow indexes (on just one column). This provides the most efficient indexing scheme.

  • Base indexes on WHERE clauses, joins, and GROUP BY statements, in that order of priority. 

  • Choose a non-clustered index for values where there are many unique values in that column (for example, an identity column, or a name column). 

  • Consider using indexed views as a means of instantiating alternate clustered indexes on a table.

  • The best way to find the optimal indexing schema for your database is by using the Index Tuning Wizard with a sample workload. You can collect the workload information by using the replay template in SQL Profiler. Do this regularly to adapt the database to changes in usage. 

You should also develop a plan for maintaining indexes. Several utilities are available for maintaining and analyzing indexes. Also make sure you also update statistics. It is a good idea to use the database options for auto create statistics and auto update statistics. For information on index utilities and statistics, see SQL Server Books Online.

Optimizing the Code Components

To get the best design, you need to do some performance analysis. Usage changes everything. Design is an iterative process. Your design should be able to adapt to changes in the system, just as your configuration does. Clearly, changes to the database objects can involve change to the front-end code as well, so these adjustments should be planned carefully.

It is also important to establish priorities when choosing which parts of the design to alter first. Some of these are not so obvious; for example when you are tuning stored procedures, you should be aware that shaving one second off of a two second procedure that runs hundreds of times per second is a much better use of your time than shaving 30 minutes off a one hour procedure that runs once a week.

Here are some tips that can give you an idea of what to be aware of in considering the best design.

  • Always use stored procedures and user-defined functions for data access if at all possible. Because these objects are compiled on the server, they can provide in the range of 40 percent higher throughput on the system as a whole. Using stored procedures and functions for the purpose of scalability, you also can add in much of the business logic to stored procedures to prevent additional unnecessary roundtrips over the network. The main consideration here is that you want to reduce network traffic, but you must also keep the transactions as atomic as possible.

  • The best security plan is to use application roles, and grant them execute permissions on the stored procedures. The users and application should never have access directly to the underlying views and tables. This is crucial for good security, and will protect your data from being accessed through query tools other than the application front-end. This approach also has other benefits: You can rely on stored procedures to enforce business rules and you also protect your ability to manage the physical structures as needed without fear of damaging any unauthorized, yet mission critical, ad hoc reports.

  • An easy way to collect meta data on stored procedure usage is to add a little piece of audit code to each of them. This can simplify trend analysis, and at the same time make it far more accurate because you will have all the data, not just a small sample of it. Examples for this are available on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\DBManagement.

If you think of stored procedures and function objects as groups of SQL statements compiled into a single execution plan, then you will have the right perspective to optimize them. Design code maximizes the effectiveness of the caching mechanisms for execution plans by following these tips:

  • Ensure all users execute with the same environmental settings. The environment is determined by server settings (sp_configure), database settings (sp_dboption), and connection settings (SET option). Don't change SET or database settings in the middle of an application or connection. 

  • Ensure batches and procedures do not require implicit resolution. The best way to do this is to use complete, explicit four-part names for all objects referenced in code: For instance, rather than selecting from "authors", select from "server1.pubs.dbo.authors". This will avoid unnecessary recompilations to resolve the name of the object. 

  • As an extra measure for optimizing the use of stored plans, study the effect of different parameter values on the stored procedure. Base a stored procedure on a range of data rather than just one ID or value. For example, use this code: 

    select * from tablename where value1 is between @x and @y
    

    You will get a different optimal execution plan if @x and @y are very close in value (needing an index lookup) than if they are widely separate (requiring a full scan). There is a trick to storing more than one execution plan for the same query:

    sp_executesql ('select * from T where c >= @p1 AND c <= @p2 
    --the index lookup plan','@p1 int @p2 int',5,10)
    
    sp_executesql ('select * from T where c >= @p1 AND c <= @p2 
    --the tablescan plan','@p1 int @p2 int',5,95)
    

    The comment at the end of the query (for example, "--the index lookup plan...") will cause SQL Server to store a separate plan for each of these queries, even though the query itself is the same.

  • Too many recompiles can seriously degrade processing performance, even with a large server. You should make sure that you generate a recompile only when one is necessary for optimal execution. When you are examining options such as WITH RECOMPILE or the use of EXEC, look at the negative effect these have in relation to the total system. Excessive recompilations are far more expensive than an occasional poor execution plan for a rarely requested query.

  • In general, you should always use sp_executesql rather than EXEC to run a dynamically created query. The reason for this is that sp_executesql has a greater possibility of reusing a cached execution plan; EXEC will cause a recompile every time. So for example, if you are calling such a procedure in a loop, the use of EXEC will cause one recompile per loop, whereas sp_executesql will not. On a large scale, recompiles can seriously degrade performance. For more information see "Using sp_executesql" in SQL Server Books Online.

  • Instead of using dynamically built queries, consider writing multiple procedures to cover the scenarios that require different execution plans.

  • Avoid using temporary tables if you can use a table variable instead. Table variables can be more efficient. In any case, avoid sharing a temporary table with a nested stored procedure in any procedure that is called frequently by the system.

For more information see "Transact-SQL Tips" and "Query Tuning" in SQL Server Books Online.

Optimizing the Storage Components

In SQL Server, the way that data files/filegroups and log files are laid out on disk contributes to database performance. Data and log files should always be separated onto different physical disks. tempdb, which has changed in SQL Server 2000, will provide better performance when placed on a large array of disks. Both the transaction log and tempdb benefit from being placed on fast drives.

Database File Placement

Here are a few tips and best practices regarding the placement of your database files:

  • The most important considerations in determining where to place your files on the server are the number of disks (spindles) available to a particular drive and the speed of the drives involved. For this reason, it is good to design the server layout with your database needs in mind. 

  • If you have a set of tables that is used together frequently, you should consider putting these tables on separate filegroups on separate physical drives, to balance I/O between them. In a larger, more heavily used system, this could be a significant difference. 

  • Consider putting non-clustered indexes in a separate filegroup, in order to split I/O between filegroups. 

  • Group your tables based on usage, to generate as many simultaneous reads to different filegroups (and therefore disks) as possible. Grouping tables into filegroups based on a maintenance need for convenient backup plans will not generate as much performance as separating the tables and indexes by usage.

  • For smaller systems, use autogrow for your database files, but keep in mind that when a "grow" is initiated, transactions must wait while the database grows. In a small database or lightly queried system this is not a big issue, but if you have a 100 GB OLTP database set to grow in 10 percent increments, and it runs out of space during peak times, the online users will be held up while the 10 GB is allocated.

  • For a larger system the best practice is to anticipate database growth, and manually increase the database at a scheduled time. Or, choose a reasonable amount to grow by that is neither too cumbersome nor so small that it will initiate expansion too frequently. 

  • If you have multiple files in your filegroup, you will need to expand them in order to reestablish proportional fill. 

For more information see "Using Files and Filegroups" in SQL Server Books Online.

Log File Placement

Here are a few tips and best practices regarding the placement of your log files:

  • Create the transaction log on a physically separate disk or RAID array. The transaction log file is written sequentially; therefore, using a separate, dedicated disk allows the disk heads to stay in place for the next write operation. For this reason, smaller systems will do well by using a single mirrored disk for the transaction log. A single mirrored physical disk should support up to approximately 1,000 transactions per second, depending on the speed of the disk itself. Systems requiring more than that should stripe the transaction log across a RAID 0+1 array for maximum performance. For highest bandwidth, the RAID controller on this array should have a (battery backed) write-back cache to speed log writes.

  • Set your transaction log to autogrow, but try to size it so it should not need to grow. The optimal size should be based on your recovery model, the level of logged activity in the database, and the interval of time between backups. Set the growth increment to a reasonable percentage, but try to anticipate when the log should be resized. If the transaction log expands too frequently or takes a long time to expand, performance can be affected.

  • The size of the log should be based on your current recovery model and your application design. If you find that you need to shrink the log periodically, you should further investigate what is causing the log to fill up, in order to fix the problem at the root rather than simply fixing the symptom.

tempdb File Placement

Here are a few tips and best practices regarding the placement of your tempdb files:

  • Place the tempdb database on a fast I/O subsystem to ensure good performance. Stripe the tempdb database across multiple disks for better performance. Move the tempdb database to disks different from those used by user databases.

  • The tempdb database can be located with the data in most situations. Larger systems that make heavy use of tempdb should consider putting tempdb on a set of disks by itself, for purposes of extra performance. It is not a good idea to co-locate any database files with the page file of the operating system.

Other File Placement

Here are some tips and best practices regarding the placement of other files:

  • The operating system should be created on a single mirrored disk (RAID 1). The page file performs well on the operating system drive, and may be left there for a database server. If you must move the page file, make sure that you do not locate it on any drive that contains a data file or log file, or tempdb. This provides resilience to disk failure (but have a boot disk ready to boot off the mirror). 

  • If you are storing your backups on the same server with your data, make sure not to store them on the same disks with your data files or log files.

Optimizing the Server Configuration

SQL Server 2000 has been enhanced so performance tuning is handled dynamically. This means that all those settings that were formerly based on usage are now based on an algorithm that monitors the server and dynamically adjusts the settings. This automatic process keeps the configuration settings constantly within 10 percent of the optimal setting based on system usage. Settings which relate to system resources can still be adjusted, and once adjusted will function dynamically within the bounds set by the DBA.

It is very important not to adjust anything in sp_configure unless you have overwhelming evidence (documentation) that your server issue cannot be resolved by fixing the code or by adjusting your storage deployment. Use the default settings. If you examine your server and find that some settings are not the default value, leave the settings at the current values, whether default or not. The reason for this is that some custom adjustments may have already been made. Carefully document this, and investigate why they were changed.

Apart from dynamic performance tuning and the important caution about sp_configure, though, an examination of your server settings may be required at times. For instance, if you are supporting third-party software that you cannot adjust, and you have already adjusted the hardware to the highest level, server configuration would be a logical area to look at.

Here are some general tips on adjusting the server configuration:

  • Before you begin, research the issue thoroughly both online and in relevant books. You will also need to run System Monitor at intervals over a period of time (a month is a good general rule). Save your results to a file, but do not create that file on the same drive with your data or log files. Also, although you want the recording interval to be frequent enough to capture problems, if you run it continually and you have more than a couple of counters turned on, you could very well cause a performance issue, thus skewing your results. Do not collect more detail than you need You can always run another test at a finer granularity if you discover a problem that requires closer analysis.

  • Before changing any server setting, always note the original value and the time that you made the change. Always monitor the server after making changes, to see whether you have had the desired effect, or an unexpected one. Do not change too many settings at once.

Exploiting the Hardware

Cc917660.spacer(en-us,TechNet.10).gif Cc917660.spacer(en-us,TechNet.10).gif

Another consideration when tuning a database server is the tuning of stored procedures, function code, indexes, and the data model. With capacity planning, for example, you can find out how much hardware you need to support a specific load, but it is unusual for a site to reach the hardware capacity right away, because the limiting factor is usually the application or database design or layout. Certainly as your system grows, you may need to do some capacity planning to decide what new hardware would be of most benefit. But you should not do this in a vacuum. Indeed, other than actual hardware errors, and a first cursory look at what you have, your focus in performance tuning should mostly be on application design and database objects. That being said, you should still make sure you are getting the most out of your server, including everything from disk arrays and data file placement to the CPU and memory.

Maximizing Performance

You will find any number of books that tell you how to maximize performance on your system by extensive monitoring, tweaking a few settings in sp_configure, and continually upgrading the hardware. Beyond a correct configuration for your system usage, you cannot do much more. Any of the steps you take to compensate for application performance, are just that: compensation. They should be viewed as temporary measures. Do not spend an excessive amount of time monitoring your system for opportunities to improve hardware when you could spend that time optimizing code instead.

The primary risk in concentrating on hardware rather than design is money. Here is a common scenario: A growing company has an application that was designed for a small user base. The company is very successful, and now it has many more users on its system. Managers make a decision to upgrade their hardware to a four-way processor server with external storage, to speed up the application. Fortunately, this works nicely.

As they continue to grow, they need still more performance: Because upgrading hardware worked before, they do the same thing again, this time upgrading to an eight-way processor server with a System Area Network (SAN). Performance increases, although not as much as before. Now the budget comes in for an application enhancement project, but it is a much higher estimate than it was before. Now application data is spread throughout the company, used in many reports, and exported to other systems as well. Redesigning the core application will mean redesigning each interface as well. Additionally, many more users will be affected by the application change; rolling out changes to desktops and laptops is now a significant part of the project. Analysis also reveals that the system now requires more extensive maintenance to accommodate the original design, and that the practical limitations of the design limit the disaster recovery options and also the service level agreement (SLA).

The application cannot be linearly upsized to the next largest server without incurring a much larger expense, and preliminary investigation into scaling out, rather than up, has shown that the expense is not only higher for hardware, but also would require more maintenance cost, and the addition of support staff. The users, on the other hand, are incurring lost time waiting for system response. Because the correct approach was delayed for too long, the business is now incurring risk based on a database design.

Capacity Planning

Capacity planning does not need to be a complicated process, although it can be if your system is very large or complex. Traditional capacity planning will tell you how much hardware you need to support a specific load on the server (assuming that the application takes full advantage of the hardware). As your system grows you may need to do this evaluative process more than once, and as you gain experience with it, you will undoubtedly refine and enhance the process for your individual system.

For a smaller system, the level of detail in a thorough capacity planning process may seem like overkill. It is still a good idea to go through the steps of capacity planning, as the process itself reveals much about your system. Collecting data for the plan is an iterative process: It must be repeated to accurately reflect the changes in usage over time. Detailed plans that involve careful collection and analysis of data generally are of immediate practical use in two situations: when you are planning for the purchase of hardware for a large system, or when your system has outgrown its current hardware and you must justify the purchase of additional hardware.

However, the best practice for capacity planning is to monitor your system periodically, making certain that you communicate the disparity between what the server is capable of and what the application can support..

For detailed coverage of capacity planning techniques, see the Microsoft SQL Server 2000 Administrator's Companion.

General Hardware Recommendations
CPU Planning

Processor planning is fairly straightforward. Monitor your current CPU utilization (\Processor(_Total)\% Processor Time). If the average is over 75 percent, if you have frequent peak usage periods where the current CPU utilization spikes over 90 percent, or if you have a situation in which the usage spikes and stays up for a while, then you should consider adding either additional processor(s) or faster processors.

In general, the processors you choose should be able to deliver the speed implied in your other system purchases. If your system is highly specialized and filled with processor intensive activities, you will become aware of that as you observe the system over time. Examples of such activities include extensive or frequent usage of Data Transformation Services, or anything involving a lot of calculations (science, accounting, and so on). SQL Server is a CPU-intensive application, so look for processors with a large high-speed cache.

If you have a dedicated SQL Server computer, use all of the processors for SQL Server. If your system is running applications in addition to SQL Server (such as Microsoft Commerce Server), then consider restricting SQL Server from using one or more processors. Otherwise, allow SQL Server and the operating system to balance across all processors.

Memory Planning

While the sum of all hardware together dictates the capacity of a system, memory is mainly focused on data access optimization. SQL Server uses memory to store execution plans, data pages between uses, and so on. Without enough memory, you will incur more disk I/O in reading data. If your system does many reads, you might reduce disk I/O by significantly increasing your memory, because the data will then remain in cache. Insufficient memory, or over-allocation of memory, can result in paging. Memory plays an important role in SQL Server, and is a resource you should carefully monitor.

For systems where reads are the highest priority (DSS), more memory is better. Memory can be used to compensate for disk I/O, and large amounts of memory can significantly decrease the number of disk (spindles) you will need to achieve high performance.

For systems where writes are the highest priority (OLTP), memory is still an important part of the system, but you may benefit more from the addition of disk spindles and controller channels, rather than memory. To be sure, you will need to monitor your system carefully to see which resources are in highest demand.

Disk Planning

The important point to remember about data storage is that the number of disks is far more important than the total storage size of the disks. One big physical disk may hold all your data, but it still has only one disk arm to execute each data request. More disk arms result in better performance.. So, when you size for new disks, check to be sure this is enough drive space; but spend more time on analyzing how many spindles you really need. For example, if your system does a lot of transactions, you will enhance performance by adding more spindles (provided there is sufficient memory and CPU to support the system as well).

When you are ordering your hardware, request a specific number of disks, rather than a specific amount of disk space. You want to have many smaller disks rather than fewer larger disks. If you have external storage, purchase the fastest array controller card and one that has multiple channels. Look at this card as a potential bottleneck: If you have many spindles, you need to make the investment in the card to support them. How much performance you get will be directly proportional to the quality of the controller, and the type of I/O your system produces.

In OLTP, you can have more disks per controller card, the disk spends more time looking for the data, and the controller channel will not become so saturated.

In DSS, where more of the queries are apt to be sequential reads, you will need more controller channels for a smaller set of disk drives. Also remember that memory is a good way to increase I/O in a DSS system.

Important Using software RAID on a server running SQL Server will increase your CPU usage. Using file compression is not recommended because it will also increase your CPU usage.

For more information on all of these topics, please see SQL Server Books Online. Also see Inside SQL Server 2000 and Microsoft SQL Server 2000 Administrator's Companion. 

Working with Existing Hardware

If you are working with hardware you already own, you should be sure that you are getting the most out of it. At the outset, you must observe a few things about your system to configure the hardware for it correctly.

What kind of system is it? Is it DSS or OLTP? Any kind of reporting system, or any system that is query oriented and does not support transactions falls into the DSS category. OLTP includes any system doing work that consists of altering data: e-commerce, order processing, accounting, and so on. These two basic high-level categories are the starting point for system configuration. The differences between them begin at a hardware level.

Is the application mission critical? If the success of your company depends on the application, if the cost per hour of system downtime is high, or if an entire department simply goes home if they cannot log in, then those are signs that this may be mission critical.

Is the system 24 x 7? If it has to be available all day every day, and it is mission critical, you will need to investigate high availability systems. For more information about high availability, see Chapter 16, "Five Nines: The Ultimate in High Availability."

If it is not a 24 x 7 system, define the hours of required uptime. This will not only help you on the hardware side, it will also let you know how to support the system. If you do not already have a SLA, collecting this information will help you create one.

How many concurrent sessions do you need to support? How many transactions (selects, inserts, updates, and deletes) will each user do per minute?

How much space do you need (in MB) for data and indexes? How big is the log and tempdb? If there are multiple databases, you will need estimates for each.

Storage Subsystem Design

When you are ready to configure the disks, you should make a diagram of the hardware to help you decide how to configure the hardware in accordance with the data file layout strategy you need. You should be aware of a few concepts. Except for the amount of total disk space you need, you should give less attention to the size of the hard drives than you give to the number of hard drives in a set. This was important in ordering the hardware, and it is important now as you layout the data files.

Before you diagram your subsystem layout, there are some things to consider beyond the data storage components: channel, RAID level, and Read/Write cache settings.

Standardize Your Configuration

Although this is vital in an environment with many servers, standardizing configurations across servers is useful in any environment. To do this, create a list of all available settings on the server that you are not leaving at the default value, and document them. This would be labeled as the standard configuration for every new computer. Obviously, many of your systems may need to deviate from the standard settings. To handle this, document the difference for that computer between the set standard and the server in question. These documents, stored as read-only files in an easily accessible area (such as a Web repository), can help you better support a large number of servers.

For the storage subsystem, standardizing drive letters is very helpful. For example.

Logical drive letter

Description

C

Operating system, SQL executables

D

Usually a CD-ROM drive

E

Reserve for another system drive, if needed

F through H

Tempdb

I through P

Data files

Q

Quorum drive

R, S

SQL executables and system databases

T through V

Transaction log files

X, Y, Z

Backups, or imported data for bulk loading

\SQLAdmin

A standard admin directory to store reports such as logs, trace results, sqldiag output, and so on

For network drives, consider using Universal Naming Conventions (UNC) to allow maximum portability of code relying on these network drives.

RAID

For a database server, you should choose hardware level RAID rather than software RAID. Software RAID uses CPU cycles, and this prevents SQL Server from performing at the maximum level. Two core RAID levels are of value for a database server: striping with parity (RAID 5) and striped mirror (RAID 0+1). The best overall option is to choose RAID 0+1 (also called RAID 01 or "striped mirror"). RAID 5 can be used in certain circumstances, but is generally more expensive in the long run, and less reliable.

Data is protected by keeping redundant copies of it on different physical disks, so if one disk is lost through hardware failure, enough information exists on the remaining disks to reconstruct the original data. In RAID 5, each time data is written to disk, it actually takes four I/O operations in order to create the read data and parity blocks, and the write data and parity blocks. This is slow for two reasons: First, each process is consecutive, so they must wait on each other; second, this operation occurs while many other transactions are vying for the disk resources. RAID 0+1 writes to the primary disk and the mirror in one operation. Although you do have to wait for the write to complete on both drives, both writes are simultaneous.

The hidden cost of RAID 5 is in the lower write performance rate, a limiting factor on the amount of records that can be entered, orders that can be processed, and data requests that can be returned. When deciding to use RAID 5, which seems more economical at first glance, you are actually balancing a few thousand dollars in hard drives against system productivity.

The advantage of RAID 5 is that it gives you more storage space. So, the question to ask is: Do you need to store a lot of data on as little hardware as possible (but still have some fault tolerance), or do you need fast access and higher fault tolerance? Take a look at the size of your data files before making this decision, even on a small server with internal-only drives.

Given the same number of physical disks, RAID 0+1 will be faster than RAID 5. An increase in the number of drives in RAID 0+1 provides a linear increase in I/O capacity. A read can be obtained from either the primary drive or its mirror, so as I/O increases, the drives will become faster as a set because they can perform multiple reads simultaneously.

You should understand how choosing a RAID option affects the result to your server of losing a RAID drive. Although the loss of a RAID drive can sometimes affect the operation of a server, you can recover the data by restoring a backup of the database. In RAID 5, if any two disks fail, the database will stop (unless you have a hot standby disk that has had time to be synched within the chain, but even then, you cannot lose more than one disk from the whole working set). RAID 0+1 will stop the database only if a disk fails in both sides of a mirrored set the same time, and the odds of that occurring based on random factors are about 5.3 percent. RAID 5 imposes a significant penalty for losing even one disk. When one drive is lost on a RAID 5 system, the read performance of the system immediately decreases. Every read or write request to the failed drive initiates a verification process against all other drives in the parity group. This performance degradation will exist until the drive is replaced and completely rebuilt by the system. During the rebuild process, the system will be more sensitive to system load due to the considerably heavier I/O requirements of the failed system as described previously. This can be a critical consideration.

RAID 0+1 will see minimal loss of performance in a failed state where the hardware allows reads from both disks in a set. In this case read performance would be slightly reduced, but only for data stored on that particular set. RAID 0+1 can actually read simultaneously from both drives in a mirrored set. This is not a simultaneous read for the same I/O operation, but for different ones. So when you have multiple read requests for the same physical disk, the I/O operations are spread over the two disks in the mirrored set.

Changing the Subsystem

After a RAID level is selected, it can be changed only by complete reconfiguration at the hardware level. This will destroy all the data on that drive, so if you decide to change your RAID level, make sure that you have verified backups of the system, and that you also either have a sufficient window of agreed-upon downtime to rebuild it, or have a standby server to take over during the implementation. Before you begin to reconfigure the storage subsystem, carefully plan exactly how you want your data files laid out. Then, according to those requirements, determine which array each disk will be allocated to, how the arrays will be configured, and how the logical drives will be formatted and laid out.

Disk Controllers

Cache 

Not all write caching is safe for use by a database server. You should make sure that your disk controller has features such as: safeguards to avoid uncontrolled reset of the caching controller, on-board battery backup, and mirrored or ERC (error checking and correcting) memory. Check with your hardware vendor to ensure that the write cache includes these and any other features necessary to avoid data loss. Do not implement write caching unless the hardware vendor guarantees that their write cache includes these features and any others required to prevent data loss.

Array accelerator cache settings can be left at the default value, which is typically 50:50 Read:Write. These settings can also be adjusted to favor reads or writes if you know which your system requires. Note that if you are using a Write setting above 0 here, you have enabled write caching.

For more information on SQL Server and caching controllers, see Knowledge Base article Q86903 at http://support.microsoft.com.

Channels 

  • If your array configuration controller supports using more than one channel, make sure you take advantage of it. The only way to check this is to open the configuration tool provided by your controller manufacturer and look at the settings. Use caution when doing this, because if you press OK or Save at any point while viewing the controller settings, you may inadvertently reconfigure your array (which would result in immediate loss of any data stored on that drive).

Windows NT File System (NTFS) Allocation Unit

When you format the new drives in Disk Administrator, you should choose an allocation unit, or block size, which will provide optimal performance. Significant performance gains may be obtained by sizing this to a larger value in order to reduce disk I/Os; however, the default value is based on the size of the physical disk. The best practice for SQL Server is to choose 64 KB because this reduces the likelihood of I/Os that span distinct NTFS allocations, which then might result in split I/Os.

For more information on planning and configuring hardware, see Inside SQL Server 2000. For specific information on your hardware, refer to the customer service site of your hardware manufacturer.

Sample Server Configurations

The following section shows examples of various storage component plans, categorized by system type and size. It is not possible for this chapter to address all pf the various situations appropriate to each application, but these examples are based on commercially available servers and standard applications designed by Microsoft customers. Assume for each of these configurations that they are dedicated SQL Servers. If you are using these guidelines, and have other applications or services running on the computer, such as IIS, adjust the memory levels as needed. For clarity, standardized drive letters are used throughout these examples.

Small Entry-Level System Layout

These are characteristic of small servers. The key consideration on a system this small is to make sure your application is designed for the growth that you anticipate. To get maximum value out of your hardware investment, you should do optimization cycles as described in this chapter, even on this smaller, growing system.

The primary goal at this level is separating the operating system, page file, data, and transaction logs.

  • Number of processors: two 

  • Memory: 512 MB; 384 MB dedicated to SQL Server 

    Disk configuration: three internal high speed drives

    • Internal 1: 18 GB—contains the OS, page file, SQL executables, and system databases 

    • Internal 2: 18 GB—contains the data files, including tempdb 

    • Internal 3: 18 GB—contains the transaction log for SQL Server 

This server is very simple compared to the later examples. Using RAID in this type of system is generally unnecessary, unless it is a high availability requirement or cost is no issue. If the application uses tempdb often, some disk I/O contention may take place since the rest of the data is also on the same disk. This server will not scale to support an enterprise system, and is a better fit for a departmental server or a company server that sees minimal use. If this server is supporting an application that is crucial to the company and has substantial activity, you definitely need to consider a larger server. To help the server scale along with the business, log the baseline monitoring counters (described earlier) on a regular basis, perhaps monthly. Document values of key counters, perhaps using a spreadsheet, so you have a permanent record to use for planning. These can be presented in report format, with a simple line graph showing the rise of specific counters as they change over time. A more sophisticated method of reporting system growth is detailed earlier in the chapter.

This server may function as either a transaction processing or reporting system, but not as both at once, unless the usage for either system is very low. The difference in disk access patterns between the two types of systems would cause an excessive amount of disk queuing and paging. You can still run simple reports on an OLTP system, and you can also run DSS style queries during off hours or build OLAP cubes during off hours. However, large analytical queries akin to those in a data mart would cause unnecessary slowness for the users of an OLTP system on the same server, provided they were both running during standard work hours.

A database server should never be used as a file server. This additional and random pattern of disk access would be even more of a problem than DSS reporting. Fileshare use on a database server also may introduce some security risks, and even worse, allows the possibility that a vital drive will fill with user files even if it is controlled at the operating system level by imposing quotas.

Expanding Capacity

As system usage grows, application performance may begin to suffer. After analyzing the system, you may decide to increase the capacity of the server by adding an external RAID array for the SQL Server data and log. SQL Server is reconfigured to use the existing files in their new locations. The new configuration might look like this:

  • Number of processors: two 

  • Memory: 2 GB; 1.5 GB dedicated to SQL Server 

  • Disk configuration: two internal high speed drives, one SCSI RAID controller with two channels, external 12 disk bay array with eight drives. All drives are configured as RAID 0+1.

    GB

    RAID partition

    Total disks

    Files

    Drive C 

    18 

    N/A 

    2 internal 

    OS, page file, SQL executables, and system databases 

    Drive I 

    54 

    6 external 

    Data files and tempdb 

    Drive T 

    18 

    2 external 

    Transaction log 

A common assumption is that with more capacity, running DSS on the same system would now be acceptable. Disk access is only part of the issue, because even if you were to separate the disk pairs to completely separate I/O for the DSS system, you would still be sharing the processors and memory. It should be worth a slightly higher investment to purchase two smaller servers. This would be most beneficial in cases where DSS users are allowed to access tables ad hoc.

This system will be noticeably faster. It has an additional CPU, more memory, and more spindles. Additionally, the transaction log has been split onto another RAID channel, which will result in some performance gain by itself. The data is now spread over four disks (two usable disks and two mirror disks). This number of spindles will provide good performance, provided the system has been properly optimized at the code and database level. The transaction log has only one usable disk, but this can support up to 1,000 transactions per second, and since these are all sequential writes, this configuration should be more than adequate for most growing systems.

If your system performs a lot of replication activity (for example, transactional replication frequently during business hours), you will want to leave the Read:Write cache at the default of 50:50, and consider adding an extra pair of disks to the transaction log (if replication is very heavy). The reason for this is that the log always writes and moves forward, except in two situations: in replication and in rollbacks.

OLTP System Server Layout

For this type of system, the transaction log for the application is split over multiple drives, so the site can support a larger number of transactions per second. For safety, backups are stored on a separate disk, which can also be used to hold data to be imported into a table. Memory is set to dynamically adjust itself.

  • Number of processors: four (up to 32) 

  • Memory: 4 GB (up to 64 GB)

  • Disk Configuration: 12 internal drives, one SCSI RAID controller with two channels, external 12 disk bay array with 12 drives. All drives are RAID 0+1.

    GB

    Raid partition

    Total disks

    Files

    Drive C 

    18 

    2 internal 

    Operating system, page file, SQL executables, and system databases 

    Drive T 

    54 

    4 internal 

    Transaction log 

    Drive Z 

    18 

    2 internal 

    Backups/Imported data files 

    Drive I

    126

    14 external 

    Data files
    tempdb 

Note that this server is designed to support one heavily used application, or several lightly used applications. The difference lies in the isolation of resources. If you support multiple systems on the same server, and you wish to isolate resources a bit further, you would consider making more than one transaction log raid array. Because data is written to the transaction log sequentially, you might get better performance per application, depending on usage, by splitting the Array B in this table into two or three arrays, one for each application transaction log. A system that supports replication or performs many rollbacks, in addition to transaction volume, will need more disks (spindles) than one that does not have any such random I/O on the transaction log drive.

The data and tempdb remain striped across as many spindles as possible. Only on a very large system, one with both heavy usage and many disks available, would you consider separating the data files onto separate sets of physical disks. The only exception to this would be if multiple channels are available on your array controller. If so, then make full use of them. Using channels increases the amount of disk I/O you can support.

DSS System Server Layout

DSS focuses more on tempdb than on transaction log use. It also makes different use of CPU and memory because of the different type of data access. Memory is set to dynamically adjust itself. Here is a sample layout for a DSS system.

  • Number of processors: eight (up to 32) 

  • Memory: 12 GB (up to 64 GB) 

  • Disk configuration: 12 internal drives, two SCSI RAID controllers with two channels, two external 12 disk bay array with 12 drives. All disks RAID 0+1.

    GB

    RAID partition

    Controller

    Total disks

    Files

    Drive C 

    18 

    2 internal 

    OS, page file, SQL executables, and system databases 

    Drive T 

    54 

    6 internal 

    Transaction log 

    Drive Z 

    36 

    4 internal 

    Backups/Imported data files 

    Drive I 

    72 

    8 external 

    Data files 

    Drive F 

    36 

    4 external 

    tempdb, possibly some data files 

    Drive J 

    90 

    10 external 

    Data files 

    Drive T 

    54 

    6 external 

    Data files, possibly indexes 

Multi-Instance N+1 Failover Cluster Configuration: SQL Server 2000 on Windows 2000 Datacenter Server

In this scenario, you have four servers of similar internal disk configuration, which share an external fibre channel SAN. Three SQL Server instances are active in the failover cluster. The requirements for CPU and RAM will vary depending on what role the server plays role in the cluster. Three of the cluster nodes are exactly the same, and own one instance apiece. The fourth node is the designated failover node, and will require a larger capacity in the event all three instances fail. AWE memory will be used. A failover cluster requires a well thought-out and certified hardware solution. For more information on failover clusters, AWE memory, and the N+1 configuration, see Chapter 12, "Failover Clustering."

Three active instances: These three servers should be exactly the same.

  • Number of processors: eight 

  • Memory: 6 GB, SQL Server limited to 4 GB 

  • Disk configuration: 2 to 4 internal drives. RAID 1. 

One failover: The failover node must have enough memory and CPU to support all three active instances, in case of failover.

  • Number of processors: 32 

  • Memory: 16 GB 

  • Disk configuration: 2 to 4 internal drives. RAID 1. 

SAN with fibre, 84 drives total 

GB

Total Disks (external)

Files

Drive Q

18

2

Quorum Drive

Drive T

36

4

Instance 1: Transaction Log

Drive U

36

4

Instance 2: Transaction Log

Drive V

18

2

Instance 3: Transaction Log

Drive I

90

10

Instance 1: Data files
Instance 1: TempDB

Drive J

108

12

Instance 2: Data Files
Instance 2: TempDB

Drive K

162

18

Instance 3: Data Files

Drive L

72

8

Instance 1: Data Files, possibly indexes

Drive M

72

8

Instance 2: Data Files, possibly indexes

Drive N

72

8

Instance 3: TempDB

Drive Z

36

4

Backups/Imported Data Files

It is important to keep in mind the separation of the internal disks from the external arrays on a cluster. The internal drive configuration will be similar on each server, and nothing that needs to be accessible to the failover instance should be stored on any internal drive.

Due to the way failover clustering works, each instance owns specific logical drives, which cannot be used by other instances. However, the logical drives can still be part of a single RAID partition, which is made up of many physical disks, as shown here. For this example, Instance 1 and 2 are OLTP applications of similar access patterns. Instance 3 is an example of a DSS type system that uses tempdb heavily enough that you would want to move it to a different drive. Note that, correspondingly, Instance 3 has the smaller transaction log drive (only two disks).

Conclusion

Cc917660.spacer(en-us,TechNet.10).gif Cc917660.spacer(en-us,TechNet.10).gif

The architectural advances inside SQL Server mean that the DBA now focuses attention in a slightly different way. SQL Server is a flexible product, with many different levels. If your system is simple, then it may be simple from an administrative perspective as well.

The concepts and processes here are not new. This unified approach involves a simple shift in the focus of the job of a DBA, based on advances in SQL Server 2000, to accommodate the new complexities of managing a database system. The data tier is not a disparate collection of systems to be analyzed in isolation, but instead a complex synthesis of all database system components

Cc917660.spacer(en-us,TechNet.10).gif

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft