SQL Server 2005
Boost Performance with New Features in SQL Server 2005
At a Glance:
- Developing performance enhancing strategies
- Using snapshot isolation
- Implementing covering indexes
- Introduction to Service Broker
With the introduction of SQL Server 2005, Microsoft has
released its third-generation database replete with both new technologies and significant enhancements to existing ones.
Of course, SQL Server™ 2000 is still a great database. It is stable, fast, well-understood, and has a healthy third-party marketplace. But the SQL Server team has put together some incredible new technologies for SQL Server 2005 that may offer a significant performance improvement in your environment.
Some of the features are aimed at new architectural patterns such as service-oriented architecture (SOA). Upgrading your database around SOA requires quite a bit more than just running an upgrade wizard; it involves a redesign of your data access architecture. Many of the new performance and scalability enhancements will benefit your database with very few, or even no, architectural changes required.
Some of the performance enhancements even come from the new ways SQL Server is being licensed. In this article, I’ll highlight nine of what I consider to be the most significant new SQL Server 2005 performance and scalability features, beginning with the out-of-the-box enhancements.
Enhancing performance extends beyond any individual feature, so before I detail the specific new performance-related features of SQL Server 2005, I want to take a moment to discuss the concept of performance optimization. You can’t expect to move a poorly designed and implemented database to SQL Server 2005 and see significant performance improvements. SQL Server is a serious development environment and SQL Server 2005 is just as complex, if not more so, than any of its predecessors. Optimizing a SQL Server 2005 database requires a broad set of skills.
Database Optimization Theory outlines five major areas that affect database performance: the database schema, set-based code, indexes, concurrency, and server tuning. Each of these areas builds on the previous area. A poor configuration in one area may create a domino effect of cascading poor performance. Conversely, an optimal configuration in each area can help ensure success in each subsequent area.
A Clean Database Schema A clean database schema makes it easier to develop fast set-based queries and code. You can never put too much effort into designing the schema—any extra time spent simplifying an elegant schema will pay off many times over in development and maintenance. What if the schema is already in production and you’re only optimizing the installation? Here’s where a solid data abstraction layer, be it in T-SQL or in the Microsoft® .NET Framework, pays off. If all data access goes through the abstraction layer, then the schema and the abstraction layer can be slowly enhanced without breaking any external code. Without a data abstraction layer, the database schema is brittle.
Set-Based Code Set-based code runs significantly faster than iterative, cursor-based code. But good set-based code won’t overcome an overly complex schema.
Indexes Indexes are the performance bridge between queries and the data. The goal of indexing is to reduce the number of page reads. A well-designed indexing strategy analyzes all the queries that access a table and designs the smallest set of indexes that serves the need. But indexes won’t overcome the performance nightmare of iterative code.
Concurrency Every transaction benefits from a clean database schema being accessed by set-based queries optimized by good indexes. Add the right transaction isolation level, tighten up the transactions, and concurrency locking and blocking problems will fade away. But even tight transactions can’t overcome the lack of a needed index.
Server Tuning Running SQL Server on a server with sufficient memory and I/O performance is vital. Remember though, that even the best new hardware will not help you conquer the root problems of an overly complex schema, iterative code, poor indexing, and long transactions. While a shiny new server will run the database faster, I’ll bet that most of those new CPU cycles are wasted. You’ll see the greatest value for your resources by optimizing the database and then tuning the server.
While improvement at any level can result in some performance gain, these five areas of performance optimization work in harmony. Each area enables performance gains in the next area. So, with this view of performance in mind let’s dig into new SQL Server 2005 performance-tuning features.
1 It seems that CPU performance hasn’t kept up with Moore’s Law (which says that hardware doubles in performance every 18 months) in the past couple years. However, the next round of CPUs with multiple CPU cores on the chip should help return equilibrium. I list this as a significant performance benefit because Microsoft is licensing SQL Server 2005 by the CPU socket and not the number of CPU cores. You read that right—for both the licensing cost and the CPU restrictions in the various editions of SQL Server 2005, Microsoft is only counting the number of sockets. This is something that is incredibly important to keep in mind as you plan your next round of server upgrades. You can now install a single CPU license of SQL Server 2005 on a server with a dual-core CPU, or max out a Standard Edition installation with four quad-core CPUs and run it as if there were 16 CPUs in the server. The licensing saving alone can justify both the hardware and SQL Server 2005 upgrades.
2 There are several major enhancements in SQL Server memory management that contribute to improved performance. Among them is the uniform memory management framework introduced in the new SQL Server Operating System Layer (sometimes referred to as SQLOS). It provides common memory brokerage between different components of SQL Server which improves performance and gives operational flexibility under various memory pressures. The uniform dynamic caching framework also improves the behavior and locality of the internal caches in a wide range of operating conditions.
In addition, SQL Server 2005 is non-uniform memory access (NUMA) architecture aware and can take advantage of memory locality which provides greater scalability. SQL Server 2005 also supports dynamic management of conventional, locked and large-page memory as well as Hot Add Memory.
Major enhancements in memory tracking, including tracking of internal memory allocations between components and external operating system-wide memory events, provide superior supportability features in memory management for SQL Server 2005.
For 32-bit installations, the use of address windowing extensions (AWE) memory has been improved so that SQL Server 2005 Standard Edition can use more than 3GB of RAM. Also, with 64-bit processors increasingly more prolific and affordable, the benefits of 64-bit server installations are becoming much more accessible. A 64-bit architecture provides significantly larger directly addressable memory space (compared to 32-bit) for all database operations, and eliminates the need for the AWE mechanism.
3 In the past, the database engine would detect that the current query execution plan might be obsolete due to a significant change in the index statistics or mixed Data Definition Language (DDL) and Data Manipulation Language (DML) statements among many other reasons. This would cause SQL Server 2000 to pause the execution of the stored procedure or batch. Then the query optimizer would generate a new query execution plan for the entire stored procedure or batch before continuing the execution. For some intensive stored procedures the performance cost of multiple recompiles was significant.
SQL Server 2005 can now perform statement-level recompiles in most situations, which, depending on the complexity of your stored procedures and the data events in your database, might provide significant out-of-the-box performance enhancements in an area that would otherwise be difficult for redesign.
4 One measure of a database is how well it balances the need to isolate transactions against the need to share the same data resource between multiple processes. The SQL Server lock manager dynamically adjusts the granularity (for example, row, page, table) and the number of locks to ease the contention between shared locks (reading data) and exclusive locks (writing to the data). You can also set the transaction’s isolation level to adjust the duration of the lock depending on the strength of the transaction integrity required for the database. However there’s always contention between those processes that read data and those that write data, and invariably some processes block other process causing those transactions to take longer. This in turn causes a domino effect of slowed performance. For heavily used online transactional processing (OLTP) databases, concurrency is a primary performance issue.
SQL Server 2005 adds a new transaction isolation level that may completely change the way you think about locking and blocking. Snapshot isolation takes a point-in-time snapshot of the data to be updated, using row versioning, and the updating transaction works on the snapshot copy of the data, leaving the original data free for reading. For databases that see significant updating while others are selecting data, the concurrency logjam is released. Reading operations no longer wait for writing operations and writes don’t wait for reads. Writing operations update the data in the snapshot copy, and can take as long as it wants without causing any locking or blocking.
There is a cost to snapshot isolation though. If two client applications both retrieve data, change their copy of the data, and then both attempt to write the data back to the database, the second application will overwrite the first application’s changes. This is called a lost update error and the same problem can occur between multiple write operations using snapshot isolation. If there are two write operations using snapshot isolation, SQL Server 2005 resolves the problem by blocking the second transaction. So, in effect, snapshot isolation trades conflict between reading and writing for conflict between multiple write operations. For databases with both heavy reads and inserts, with little updating or at least concurrent updating of the same resource, snapshot isolation is a solution for concurrency issues.
There are two implementations of this technology to choose from: snapshot isolation and read committed snapshot. Both employ the automatic row-versioning, but the read committed model has less overhead, behaves most like the SQL Server default transaction isolation level, and is the most commonly recommend snapshot isolation level.
Traditionally, isolation levels must be set with T-SQL commands at the beginning of the batch or stored procedure. There’s no way to set a transaction level for an entire database. However, since snapshot isolation will likely be used as a database-level upgrade, Microsoft has offered a method to enable snapshot isolation database-wide with a single setting.
Another cost of snapshot isolation is that it makes heavy use of tempdb. In fact, SQL Server 2005 makes greater use of tempdb than SQL Server 2000 did, so I recommend locating tempdb on its own high-performance drive system.
5 The contention between database readers and writers also plays out in designing indexes. Indexes are great for reading from the database, but they must be updated so they slow down database writes. Don’t fall for the myth that indexes only serve database readers. Indexes are vital in locating the row to be updated. Finding and updating the row rapidly reduces the transaction time and the whole database runs smoother.
One of my favorite indexing tactics is a covering index, meaning that the index covers, or handles, all of the data required by the query. Since SQL Server 7.0, the query optimizer has been able to retrieve the data from the index without having to load the table’s data pages. When data is located using a non-clustered index, SQL Server uses a bookmark lookup operation to jump from the non-clustered index to the clustered data pages. To see a bookmark lookup in the query execution plan, write a query that uses a non-clustered index in the WHERE clause.
This becomes significant when the query is retrieving multiple rows that may be located in the same page in a non-clustered index, but spread out over the clustered index. Rows that might have been identified from two non-clustered index pages are then fully retrieved using the bookmark lookup to retrieve data from perhaps dozens of data pages. Since the whole purpose of indexing is to reduce the number of physical page reads, if all the required columns can be located and then retrieved from the non-clustered index pages without having to access the clustered data pages, that’s a well-designed performance improvement.
In previous versions of SQL Server there was no specific command to create a covering index, and no query index hint to use. If the index was there, the query optimizer would automatically use the index.
As with most things in life, it’s always a trade-off between pros and cons. The drawback of a covering index is that modifications to data must also modify indexes, and wide covering indexes by their very nature and are expensive to update. The frustration is that most of the addition columns in a covering index are there simply so data can be retrieved from the index. They are not used for sorting or selecting data, but SQL Server is forced to sort and update every column of a wide covering index anyway. This makes covering indexes as bad for write operations as they are good for read operations. The contention between optimizing read operations and write operations continues.
One of my favorite new features tackles this problem head on. Non-clustered indexes may now be created with some columns sorted and may include additional columns for the sole purpose of data retrieval. The following code demonstrates building a covering index for the Sales.SalesOrderDetail table in an unmodified copy of the sample Adventureworks database:
CREATE NONCLUSTERED INDEX IX_SalesOrderDetailCovering
(SalesOrderID, SalesOrderDetailID, UnitPrice,
This next query searches for order details that include the red sport helmet and used the sport helmet discount, and returns some basic data about that order detail line item:
SELECT SalesOrderID, SalesOrderDetailID, ProductID, SpecialOfferID,
UnitPrice, OrderQty, UnitPriceDiscount
WHERE ProductID = 707
AND SpecialOfferID = 8
Without the covering index, SQL Server does a clustered index scan. However, with the covering index in place, the query execution plan reveals that the query optimizer used an index seek to locate and retrieve the data and select to return the columns. The final result? The total number of logical reads drops from 1241 to 4, and the query is about three magnitudes faster. And thanks to SQL Server 2005, the covering index no longer carries the overhead of slower updates.
6 More than just an enhancement of partitioned indexed views, table partitioning dramatically improves the ability to organize a very large table into smaller tables. Not only does table partitioning mean smaller index structures, it also allows index maintenance to be performed on smaller tables. Typically, table partitioning benefits performance for very large tables consisting of mostly static data and archival data, and a much smaller set of active data.
7 The word most often used to describe XML is ubiquitous. The question is whether XML data belongs inside the database or if it’s just for data transportation. Since XML is better at handling unstructured data, it’s likely that future databases will be a hybrid of relational and XML data. SQL Server 2005 includes three technologies designed to provide high-performance XML capabilities. Instead of storing XML data in a text column, the new XML data type can store up to 2GB of XML data per row. XML data can be indexed by node, path, or values. And XQuery FLOWR queries can be used within SQL queries to join relational and XML data. XML and performance are no longer at odds.
8 Service Broker is a quiet new feature that might be easily overlooked or misunderstood, but it’s actually a very powerful enabling technology that is aimed directly at performance and scalability. I believe that in the right architecture, Service Broker is the most significant scalability feature in SQL Server 2005. In essence, Service Broker is a large payload queue perfectly suited to be a work queue or a message broker between two internal services.
As a work queue, Service Broker can time balance the load. For example, if the Web site receives the majority of its order requests in the last hour of the work day, those requests can be acknowledged to the user, and placed in the queue for processing as soon as possible instead of being processed in real time and bogging down the system. Similar to metering traffic with a stop light on the entrance to the freeway, Service Broker can let the incoming transactions pile up in the queue while a few stored procedures pull the transactions off the queue and processes them smoothly.
Similarly you can use Service Broker for a database that has a large amount of processing validation inside a trigger. (I know this is a bad practice, but it’s out there.) Rather than perform all the processing in the trigger, just update the row with a pending flag and add the primary key to a Service Broker queue. The original transaction can commit immediately and the whole system will seem faster. The longer validation or processing can then be performed by another stored procedure in the next few minutes and the row’s pending flag can be set to validated.
If you’re employing SQL Server 2005 as an SOA/Web Services data store using the new high-performance HTTP endpoints and the native SOAP features (very cool, by the way) for the maximum scalability, it makes sense to receive the SOAP message from the endpoint, place it in a Service Broker queue, and acknowledge the receipt of the message without any processing. Then let another stored procedure receive the message from the queue and process it in a loosely coupled design.
So, how does Service Broker actually work? In brief, the structure of a queue begins by defining message types, contracts, conversations, services, and queues. The queues are SQL Server tables so all of the SQL Server availability and transactional functionality is right there. Within a transaction, the initiating process opens a conversation and sends one or more messages with the same conversation ID. Later, another process receives from the queue the messages within the conversation. Service Broker guarantees that the messages will be received once and only once, in the order they were placed on the queue.
The process receiving the message could be running on another SQL Server instance, so Service Broker can be designed to asynchronously broker messages between different locations or it can be used to scale out between multiple servers.
9 One of the most exciting additions to SQL Server 2005 has to be the new common language runtime (CLR) integration—the ability to execute .NET managed code within the SQL Server. If you haven’t played with the CLR, the basic flow is that you create a new SQL Server project in Visual Studio®, create a new item of type stored procedure, user-defined function, trigger, user-defined aggregate, or custom data type. Of course, the CLR is turned off by default and must be explicitly enabled. Once the code is complete, you build and deploy the new DLL. Visual Studio will automatically install the DLL on the project’s data source SQL Server. Or you can move the DLL to another SQL Server and use the new Create Assembly DDL command to import the .NET-based code into SQL Server.
Including the CLR inside SQL Server is a serious undertaking for Microsoft. SQL Server isn’t calling a .NET assembly externally; SQL Server is actually hosting the CLR and managing the resources internal to SQL Server. This provides several key performance benefits. Instead of using an ADO client connection between the .NET-based code and the database engine, SQL Server provides a highly optimized internal native connection. And since SQL Server is managing the resources, it can also protect against memory leaks and errors. In effect, SQL Server is functioning as a mini-operating system while managing the .NET-based code.
I believe the most common initial performance benefit from the CLR will be moving scalar user-defined functions from T-SQL to CLR. Compiled .NET is dramatically faster than T-SQL at simply crunching variables and calculating expressions. If the code doesn’t need to access that database directly, then the CLR is definitely the way to go. Some early tests show that the CLR can be up to one hundred times faster than T-SQL at pure computational programming.
However, T-SQL is still the native SQL Server language and is still faster at data manipulation tasks (pure DML—selects, inserts, updates and deletes) than using .NET and ADO.
So, where will it make sense to use the CLR? First, move any extended stored procedures into the CLR so they’ll run internally to SQL Server. Next, refactor your scalar user-defined functions that don’t access data since they benefit the most from the most performance gain with managed code. Lastly, if you have a business tier written in .NET with complex rules that are better written in .NET than T-SQL, keep the layers logically separated, but consider testing the performance with some of them moved to the CLR within SQL Server. If the server has enough CPU headroom, the code will benefit from the tight integration and the close proximity to the data.
However you decide to employ the CLR, please don’t convert all your stored procedures and functions to the .NET Framework because you prefer C# to T-SQL. I understand that C# is a more sophisticated language, but T-SQL is still the language of SQL Server.
Summing It Up
SQL Server has always encompassed a complex set of database technologies, and SQL Server 2005 is the most complex version of SQL Server so far. The next generation of database solutions will play in a new architectural landscape and SQL Server 2005 is ready with the technologies and the performance required.
Paul Nielsen is a mentor with Solid Quality Learning, author of SQL Server Bible series, and may be reached through www.SQLServerBible.com.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited