Chapter 15 - Storage Engine

Article
01/28/2010

Ten years ago, it was common for database application development to take months or years. When databases were built, everything was worked out in advance: the database size, schema, number of users, and so on. Now database applications are developed over weeks or months, evolve during the process, and are put into production before all issues are fully understood.

This rapid deployment of mission-critical applications places rigorous demands on the storage engine, which needs to be highly available and have a fast recovery system and automatic management utilities.

Storage Engine Goals

There are several important goals for the Microsoft SQL Server 7.0 storage engine. The defining strategy is to further improve ease of use so that applications using database technology can be deployed widely. Ideally, the database becomes completely transparent to end users and nearly transparent to database administrators.

Ease of Use

Customers are looking for solutions to business problems. Most database solutions bring multiple layers of cost and complexity. SQL Server versions 6.0 and 6.5 defined ease of use as a relational database management system (RDBMS) feature. SQL Server 7.0 takes this concept to the next level, firmly establishing the product as one of the least complex databases for building, managing, and deploying business applications.

For the SQL Server 7.0 storage engine, ease of use includes many innovative features, including:

No database administrator for standard operations. This allows branch-office automation, and desktop and mobile database applications.
Transparent server configuration, database consistency checker (DBCC), index statistics, and database backups.
Streamlined and simplified configuration options, which automatically adapt to the specific needs of the environment.

Scalability

Customers must protect their investments in business applications, and as organizations grow, the database must grow to handle more data, transactions, and users. SQL Server 7.0 delivers a single database engine that scales from a laptop computer running the Microsoft Windows 95 or Windows 98 operating system to terabyte-size symmetric multiprocessing (SMP) clusters running the Microsoft Windows NT Server, Enterprise Edition, operating system. All of these systems must maintain the security and reliability demanded by mission-critical business systems.

These storage engine features are the foundation for scalability:

New disk format and storage subsystem to provide storage that is scalable from small to large databases
Redesigned utilities to support terabyte-size databases efficiently
Large memory support to reduce the need for frequent disk access
Dynamic row-level locking to allow increased concurrency, especially for online transaction processing (OLTP) applications
Unicode support to allow for multinational applications

Reliability

SQL Server 7.0 eliminates many concurrency, scalability, and reliability problems by replacing complex data structures and algorithms with simple structures. The new structures scale better and do not have concurrency problems. They are also less complex, and therefore more reliable.

SQL Server 7.0 eliminates the need to run DBCC checks prior to every backup. Run-time checks of critical data structures result in a more robust database. SQL Server 7.0 drops the recommendation to run DBCC prior to every backup, resulting in a significantly faster DBCC.

Storage Engine Features

The table summarizes the storage engine features of SQL Server 7.0.

Feature	Description and benefits
Data type sizes	Size limitations of data types increased dramatically.
Databases and files	Database creation simplified. Databases now reside on operating-system files instead of logical devices.
Dynamic memory	Performance improved by optimizing memory allocation and usage. Simplified design minimizes contention with other resource managers.
Dynamic row-level locking	Full row-level locking implemented for both data rows and index entries. Dynamic locking chooses the optimal level of lock (row, page, multiple page, table) for all database operations automatically. Improved concurrency provided with no tuning. Database supports hints to force a particular level of locking.
Dynamic space management	Automatic grow and shrink allowed within configurable limits, minimizing the need for database administrator intervention. It is not necessary to preallocate space and manage data structures.
Evolution	New architecture designed for extensibility, with a foundation for object-relational features.
Large memory support	Memory addressing supported for greater than 4 gigabytes (GB) in conjunction with Windows NT Server 5.0, Alpha-processor based systems, and other techniques.
Log manager	Design simplified to improve performance for truncation, online backup, and recovery operations.
Read ahead	Smart read-ahead logic implemented to improve performance and eliminate manual tuning.
Text and image	Text and image data stored separately in an optimized format.
Unicode	Native Unicode implemented, with Open Database Connectivity (ODBC) and OLE DB Unicode application programming interfaces (APIs), to improve multilingual support.

Storage Engine Architecture

Microsoft SQL Server 7.0 scales to both large enterprise applications and laptop applications. This scalability is based on a completely new set of on-disk structures designed to handle applications for years to come. The new formats allow the server to scale from low-end to high-end systems to improve performance and manageability.

The benefits to the SQL Server 7.0 on-disk structures include:

Improved scalability and integration with Windows NT Server.
Better performance with larger I/O operations.
Stable record locators to allow more indexes.
More indexes to speed decision support queries.
Simpler data structures to provide better quality.
Greater extensibility, so that subsequent releases will have a cleaner development process and new features will be faster to implement.

Storage Engine Subsystems

Most relational database products have two primary components, a relational engine and a storage engine. Storage engine subsystems include:

Mechanisms that store data in files and find pages, files, and extents
Record management for accessing records on pages
Access methods by using B-trees, which help find records quickly by using record identifiers
Concurrency control for locking, which helps implement the physical lock manager and locking protocols for page-level or record-level locking
I/O buffer management
Logging and recovery
Utilities for backup and restore, consistency checking, and bulk data loading

Physical Database Organization

Microsoft SQL Server 7.0 databases are now stored in operating-system files, not on SQL Server logical devices. The UNIX legacy of database devices and segments has been replaced with a simple system that maps each database to its own set of files.

Pages and Extents

The fundamental unit of data storage in SQL Server is the page. In SQL Server 7.0, the page size is 8 KB, increased from 2 KB. At the start of each page is a 96-byte header used to store system information such as the type of page, the amount of free space on the page, and the ID of the object owning the page.

There are seven page types in the data files of a SQL Server 7.0 database.

Page type	Contains
Data	Data rows with all data types except text, ntext, and image
Index	Index entries
Log	Log records recording data changes for use in recovery
Text/Image	text, ntext, and image data types
Global Allocation Map	Information about allocated extents
Page Free Space	Information about free space available on pages
Index Allocation Map	Information about extents used by a table or index

Data pages contain all the data types in data rows except text, ntext, and image, which are stored in separate pages. Data rows are placed serially on the page starting immediately after the header. A row-offset table starts at the end of the page.

The row-offset table contains one entry for each row on the page. Each entry records how far from the start of the page the first byte of the row is. The entries in the row-offset table are in reverse sequence from the sequence of the rows on the page. In SQL Server 7.0, rows cannot span pages, and the maximum amount of data contained in a single row is 8,060 bytes, not including the text, ntext, and image data types.

Extents are the basic unit in which space is allocated to tables and indexes. An extent is eight contiguous pages, or 64 KB. To make its space allocation efficient, SQL Server 7.0 does not allocate entire extents to tables with small amounts of data.

SQL Server 7.0 has two types of extents: uniform and mixed. Uniform extents are owned by a single object; all pages in the extent can be used only by the owning object.

Mixed extents, introduced in SQL Server 7.0, work well for small applications. In SQL Server, space is added to tables one extent at a time. In SQL Server 7.0, this can lead to a large overhead for tiny tables because pages are 8 KB. A mixed extent allows allocation of a single page to a small table or index. Only when the table or index has allocated more than eight pages will it begin to allocate uniform extents. Mixed extents are shared by up to eight objects. A new table or index is allocated pages from mixed extents. When a table or index grows to eight pages, it is switched to uniform extents.

Torn Page Detection

Torn page detection ensures database consistency. In SQL Server 7.0, pages are 8 KB, and Windows NT Server performs I/O in 512-byte segments. This discrepancy makes it possible for a page to be written partially, which could happen if there is a power failure or another problem between the time when the first 512-byte segment is written and the 8 KB of I/O is complete.

If the first 512-byte segment is written, it may appear that the page has been updated when it was not. (The timestamp for the page is in the header, the first 96 bytes of the page.) There are several ways to deal with this problem. You can use battery-backed cached I/O devices that guarantee all-or-nothing I/O. If you have one of these systems, torn page detection is unnecessary.

SQL Server can detect incomplete I/O by creating a mask of bits, one bit from each segment in the page. Every time a page is written, the bit is flipped from its previous state (as it was on disk), and the actual state is saved in the page header. If a page is read and a bit is in the wrong state, it indicates that an I/O did not complete and there is a torn page. This mechanism is less expensive than computing a checksum.

You can set torn page detection on and off because the page header is marked when the bit is flipped. If torn page detection is set on and off, then the state in the pages that have been flipped is observed and corrected the next time they are read.

Files and Filegroups

SQL Server 7.0 creates a database using a set of operating-system files, with a separate file for each database. Multiple databases no longer can exist in the same file. There are several important benefits to this simplification: Files can grow and shrink, and space management is simplified.

All data and objects in the database, such as tables, stored procedures, triggers, and views, are stored within the following operating-system files only.

File	Description
Primary data file	This file is the starting point of the database. Every database has only one primary data file. The recommended file extension is .mdf.
Secondary data files	These files are optional and can hold all data and objects that are not on the primary data file. Some databases may not have any secondary data files, while others may have multiple secondary data files. The recommended file extension is .ndf.
Log files	These files hold all the transaction log information used to recover the database. Every database has at least one log file. The recommended file extension is .ldf.

When a database is created, all the files that comprise the database are zeroed out (filled with zeros) to overwrite any data left on the disk by previously deleted files. Although this means that the files take longer to create, it prevents Windows NT from clearing out the files when data is written to the files for the first time (because they are already zeroed out) during normal database operations. This improves the performance of day-to-day operations.

A database consists of one or more data files and one or more log files. The data files can be grouped into user-defined filegroups. Tables and indexes then can be mapped to different filegroups to control data placement on physical disks.

Filegroups are a convenient unit of administration that improve flexibility. With a terabyte-size database, regardless of backup speed, it is impossible to back up the entire database within a reasonable window. SQL Server 7.0 allows you to back up a different portion of the database each night on a rotating schedule.

Filegroups work well for sophisticated users who know where they want to place indexes and tables. SQL Server 7.0 can work effectively without filegroups; therefore, many systems will not need to specify user-defined filegroups. In this case, all files are included in the default filegroup, and SQL Server 7.0 can allocate data effectively within the database.

Log files are never a part of a filegroup; log space is managed separately from data space.

Using Files and Filegroups

Files and filegroups improve database performance by allowing a database to be created across multiple disks, multiple disk controllers, or redundant array of inexpensive disks (RAID) systems. For example, if your computer has four disks, you can create a database that comprises three data files and one log file, with one file on each disk. As data is accessed, four read/write heads can access the data simultaneously, which speeds up database operations.

Additionally, files and filegroups permit better data placement because a table can be created in a specific filegroup. Performance is improved because all I/O for a specific table can be directed at a specific disk. For example, a heavily used table in a database can be placed on one file in one filegroup located on one disk, and another less heavily accessed table can be placed on another file in another filegroup located on a second disk.

Here are some recommendations for files and filegroups:

Most databases work well with a single data file and a single log file.
If you plan to use multiple files, use the primary data file for system tables and objects only. Create at least one secondary data file to store user data and objects.
To maximize performance, create files or filegroups on as many different local physical disks as are available, and place objects that compete heavily for space in different filegroups.
Use filegroups to allow placement of objects on specific physical disks.
Place different tables used in the same join queries in different filegroups. This improves performance due to parallel disk I/O searching for joined data.
Place heavily accessed tables and nonclustered indexes belonging to those tables in different filegroups. This will improve performance due to parallel I/O if the files are located on different physical disks.
Do not place log files on the same physical disk as other files and filegroups.

Space Management

SQL Server 7.0 features many improvements in space allocation and space management within files. For example, the data structures that track page-to-object relationships have been redesigned. Instead of linked lists of pages, bitmaps are used because they are cleaner, simpler, and facilitate parallel scans. Now each file is more autonomous; it has more data about itself, within itself. This works well for copying or mailing database files.

SQL Server 7.0 also has a more efficient system for tracking table space. The changes allow:

File shrink and file grow.
Better support for large I/O.
Row-space management within a table.
Less expensive extent allocations.

In earlier versions of SQL Server, space allocations can cause blocking when large amounts of data were added. The SQL Server 7.0 space allocation algorithms and data structures are simple and efficient and will not cause blocking because SQL Server 7.0 tracks free space on a page. As rows are deleted from a table that does not have a clustered index, that space can be reused for new inserts. This is a more efficient use of disk space and speeds table scans by packing data more densely.

SQL Server 7.0 allocates pages to objects quickly and reuses space freed by deleted rows. These operations are internal to the system and use data structures not visible to users, yet the processes and structures are referenced occasionally in SQL Server messages. This information about space-allocation algorithms and data structures will help users and administrators understand the messages generated by SQL Server.

SQL Server 7.0 introduces some significant changes in the internal data structures that manage the allocation and reuse of pages. These data structures are not visible to end users, so the changes do not affect users other than by improving speed.

File Shrink

Laptop and desktop systems may have restricted disk space, so you can shrink database files automatically if the option is enabled. The server checks the space usage in each database periodically. If a database is found with a lot of empty space, the size of the files in the database is reduced. Both data and log files can be shrunk. This activity occurs in the background and does not affect any user activity within the database. You also can use SQL Server Enterprise Manager or DBCC to shrink files individually or as a group.

File shrink works by moving rows from pages at the end of a file to pages allocated earlier in the file. In an index, nodes are moved from the end of a file to pages at the beginning of a file. In both cases, pages are freed at the end of files and returned to the file system. Databases can shrink only to the point that no free space is remaining; there is no data compression.

File Grow

Automated file growth reduces the need for database management and eliminates many problems that occur when logs or databases run out of space. When creating a database, an initial size for the file must be given. SQL Server creates the data files based on the size provided by the database creator, and data is added to the database that these files fill. By default, data files are allowed to grow as much as necessary until disk space is exhausted. Alternatively, data files can be configured to grow automatically when they fill with data, but only to a predefined limit. This prevents disk drives from running out of space.

This feature is useful when SQL Server is used as a database embedded in an application, in which the user does not have ready access to a system administrator. The user can let the files grow automatically as needed to lessen the administrative burden of monitoring the amount of free space in the database and allocating additional space manually.

When creating a database, data files should be as large as possible, based on the maximum amount of data expected in the database. You can permit the data files to grow automatically, but place a limit on the growth. If the initial data file size is exceeded and the file starts to grow automatically, reevaluate the expected maximum database size and plan accordingly by adding more disk space or by creating and adding more files or filegroups to the database.

Databases can be prevented from growing beyond their initial size. If the files fill, no more data can be added unless more data files are added. Allowing files to grow automatically can cause fragmentation if many files share the same disk. Therefore, it is recommended that files or filegroups be created on as many different local physical disks as are available. Objects that compete for space should be placed in different filegroups.

Lock Enhancements

Microsoft SQL Server 7.0 offers these types of lock enhancements:

Row-level locking
Dynamic locking
Locking modes

Row-Level Locking

SQL Server 6.5 introduced row-level locking on inserts. SQL Server 7.0 supports full row-level locking for both data rows and index entries. Transactions can update individual records without blocking the pages. Many OLTP applications experience increased concurrency, especially when applications append rows to tables and indexes. The lock manager dynamically adjusts the resources it uses for larger databases, eliminating the need to adjust the locks' server configuration option manually. The lock manager automatically chooses between page locking (preferable for table scans) and row-level locking (preferable for inserting, updating, and deleting data).

Dynamic Locking

SQL Server 7.0 has a superior locking mechanism that is unusual in the database industry: dynamic locking. At run time, the storage engine dynamically cooperates with the query processor to choose the lowest-cost locking strategy based on the characteristics of the schema and query.

Dynamic locking has these advantages:

Simplified database administration, because database administrators no longer need to be concerned with adjusting lock-escalation thresholds
Increased performance, because SQL Server minimizes system overhead by using locks appropriate to the task
Concentrated development by application developers, because SQL Server adjusts locking automatically

Dynamic locking allows different types of resources to be locked by a transaction. To minimize the cost of locking, SQL Server locks resources automatically at a level appropriate to the task. Locking at a smaller granularity, such as rows, increases concurrency but has a higher overhead because more locks must be held if many rows are locked. Locking at a larger granularity, such as tables, is expensive in terms of concurrency because locking an entire table restricts access to any part of the table by other transactions. However, it has a lower overhead because fewer locks are being maintained.

SQL Server can dynamically lock the following resources (thereby increasing granularity).

Resource	Description
RID	A row identifier. Used to lock a single row within a table.
Key	A row lock within an index. Used to protect key ranges in serializable transactions.
Page	An 8-KB data page or index page.
Extent	A contiguous group of eight data pages or index pages.
Table	An entire table, including all data and indexes.
DB	A database.

Locking Modes

SQL Server locks resources by using different locking modes that determine how the resources can be accessed by concurrent transactions. SQL Server uses several resource locking modes:

Shared
Update
Exclusive
Intent
Schema

Shared Locks

Shared locks allow concurrent transactions to read a resource. No other transactions can modify the data while shared locks exist on the resource. Shared locks are released as soon as the data on a resource has been read, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared locks for the duration of the transaction.

Update Locks

Update locks are used on resources that can be updated. Update locks prevent a common form of deadlock. If two transactions acquire shared locks on a resource and then attempt to update data concurrently, an attempted conversion to exclusive locks occurs. Both transactions are waiting for the other to release its shared lock, and a deadlock occurs. Update locks eliminate this problem, because only one transaction at a time can obtain an update lock to a resource.

Exclusive Locks

Exclusive locks are used for data-modification operations, such as updates, inserts, or deletes. Exclusive locks ensure that multiple updates cannot be made to the same resource at the same time (concurrent transactions). No other transactions can read or modify data locked with an exclusive lock.

Intent Locks

Intent locks indicate that SQL Server is attempting to acquire a shared or exclusive lock on some of the resources lower in the hierarchy. Intent locks improve performance because SQL Server must examine intent locks only at the table level to determine if a transaction can safely acquire a lock on that table. This removes the need to examine every row or page lock on the table to determine if a transaction can lock the entire table.

Schema Locks

There are two types of schema locks: Sch-M and Sch-S. Sch-M locks are taken when a table data definition language (DDL) operation is being performed (such as adding a column or dropping a table). Sch-S locks are taken while compiling queries. Sch-S locks do not block out any transactional locks, including exclusive locks. Therefore, other transactions can run while a query is being compiled, even transactions with exclusive locks on a table. However, DDL operations cannot be performed on the table.

Base Table and Index Architecture

Fundamental changes were made in Microsoft SQL Server 7.0 base-table organization that allow the query processor to make use of more secondary indexes, which greatly improves performance for decision support applications. The query optimizer has a wide set of execution strategies, and many of the optimization limitations of earlier versions of SQL Server have been removed. In particular, SQL Server 7.0 is less sensitive to index-selection issues, resulting in less need for tuning.

Table Organization

The data for each table now is stored in a collection of 8-KB data pages. Each data page has a 96-byte header containing system information, such as the ID of the table that owns the page and pointers to the next and previous pages for pages linked in a list. A row-offset table is at the end of the page; data rows fill the rest of the page.

SQL Server 7.0 tables use one of two methods to organize data pages:

Clustered tables, which are tables that have a clustered index. The data rows are stored in order based on the clustered index key. The data pages are linked in a doubly linked list, and the index is implemented as a B-tree index structure that supports fast retrieval of the rows based on their clustered index key values.
Heaps, which are tables that have nonclustered indexes. The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.

Index Organization

An index (clustered or nonclustered) speeds retrieval of the rows in a table. An index contains key values built from one or more columns in a table. These keys are stored in a structure that allows SQL Server to find rows associated with the key values quickly and efficiently. If a table has no indexes, the data rows are not stored in any particular order, which is called a heap.

Clustered Indexes

In a clustered index, the order of the values in the index is the same as the order of the data in the table. (To some degree, the storage of the data is the index.) A clustered index is analogous to the alphabetic listings contained in a telephone directory.

Because a clustered index dictates the storage of the table's data, a table can contain only one clustered index; however, an index can be created on multiple columns. For example, a telephone directory is organized by last name, and then by first name.

A clustered index contains a hierarchical tree, with the range of values stored in a given area of the index. When searching for data based on a clustered-index value, SQL Server quickly isolates the page with the specified value, and then searches the page for the records with the specified value. The lowest level, or leaf node, of the index tree is the page that contains the data.

Nonclustered Indexes

A nonclustered index is analogous to an index in a textbook, in which the data is stored in one place and the index is stored in another. Pointers indicate the storage location of the index items in the data. The leaf node of a nonclustered index is the storage location (its page number and offset in the page) of the index entry. Therefore, a nonclustered index has an extra level between the index structure and the data itself.

When SQL Server searches for data based on a nonclustered index, it searches the index for the specified value to obtain the location of the rows of data, and then retrieves the data directly from its storage locations. This makes nonclustered indexes the optimal choice for exact-match queries.

Some nonclustered indexes contain multiple indexes. You can define a nonclustered index for each of the columns commonly used to find data in the table.

Because nonclustered indexes store clustered index keys as pointers to data rows, it is important to keep clustered index keys as small as possible. Avoid choosing large columns as the keys to clustered indexes if a table also has nonclustered indexes.

SQL Server 7.0 supports up to 249 nonclustered indexes on each table. Nonclustered indexes and clustered indexes share a similar B-tree structure. The difference is that nonclustered indexes have no effect on the order of the data rows. The collection of data pages for a heap is not affected if nonclustered indexes are defined for the table.

Index Distribution Statistics

All indexes have distribution statistics that describe the selectivity and distribution of the key values in the index. Selectivity is a property that relates to how many rows are typically identified by a key value. A unique key value has high selectivity; a nonunique key value (one found in 1,000 rows, for example) has poor selectivity.

The selectivity and distribution statistics are used by SQL Server to optimize its navigation through tables when processing Transact-SQL statements. The statistics for each index are not limited to a single page, but are stored as a long string of bits across multiple pages in the same way image data is stored.

Data Type Enhancements

Microsoft SQL Server 7.0 introduces enhancements to data storage and to text and image handling.

Unicode Data Types

SQL Server 7.0 supports Unicode data types, which makes it easier to store data in multiple languages within one database by eliminating conversion of characters and installation of multiple code pages. Unicode stores character data using 2 bytes for each character (rather than 1 byte). There are 65,536 different bit patterns in 2 bytes; therefore, Unicode can use one standard set of bit patterns to encode each character in all languages, including languages such as Chinese that have many characters. Programming languages also support Unicode data types.

The fact that Unicode data needs twice as much storage space is offset by the elimination of the need to convert extended characters between code pages. The new data types that support Unicode are ntext , nchar , and* * nvarchar. They are the same as text , char , and* * varchar except for the wider range of characters supported and the increased storage space used.

Traditional non-Unicode data types in SQL Server allow the use of characters that are defined by a particular character set. A character set is chosen during SQL Server setup and is immutable throughout the life of the installation. By using Unicode data types, a column can store any character that is defined by the Unicode Standard, which includes all characters defined in the various character sets.

Storage of Data Types

Data storage flexibility is improved in SQL Server 7.0 with the expansion of maximum limits for char, varchar, binary, and varbinary data types from 255 bytes to 8,000 bytes. It is no longer necessary to use text and image data types for anything but extremely large data values. The Transact-SQL string functions also support the large char and varchar data types, and the SUBSTRING function can be used to process text and image columns.

Additionally, the handling of nulls and empty strings has been improved. And a new uniqueidentifier data type is provided for storing a globally unique identifier (GUID).

text, ntext, and image Data Types

SQL Server provides a solid foundation for building object-relational features. One SQL Server 7.0 enhancement is for text and image storage, which uses a new, efficient data structure.

The SQL Server 7.0 ntext, text, and image data types are capable of storing extremely large amounts of data (up to 2 GB) in a single value. A single data value is typically larger than can be retrieved in one step by an application; some values may be larger than the virtual memory available on the client. This means that special steps may be necessary to retrieve these values. If the value for these data types is no longer than a Unicode, character, or binary string (4,000 characters, 8,000 characters, or 8,000 bytes), the value can be referenced in SELECT, UPDATE, and INSERT statements much the same as smaller data types.

text, ntext , and image data type values are not stored as part of the data row but in a separate collection of pages. For each value, only a 16-byte pointer is stored. For each row, the pointer points to the location of data. A row containing multiple text, ntext, or image columns has one pointer for each column.

The data is stored in a collection of 8-KB pages that are not necessarily located next to each other. In SQL Server 7.0, the pages are organized logically in a B-tree structure. In earlier versions, they are linked in a page chain. The advantage of the B-tree structure is that operations starting in the middle of the string are more efficient, making the B-tree quicker to navigate. This is in contrast to page chains, which often take a somewhat random path through the files of a database. The structure of the B-tree differs slightly depending on whether there is more or less than 32 KB of data.

Log Manager Architecture

A Microsoft SQL Server 7.0 log file consists of one or more physical files that contain log entries only. Previously, the log file was a system table that used ordinary database pages. These log pages were allocated and deallocated in the same way as pages of other tables, and they competed with data pages for space in the memory cache.

Each log file is divided logically into smaller segments called virtual log files, which are the unit of truncation for the transaction log. When a virtual log file no longer contains log records for active transactions, it can be truncated and the space becomes available for logging new transactions.

SQL Server 7.0 avoids having lots of small virtual log files. The number of virtual log files grows more slowly than their size. If a log file grows in small increments, it usually will have many small virtual log files. If the log file grows in larger increments, SQL Server will create a smaller number of larger virtual log files.

As records are written to the log file, the end of the log file grows from one virtual log file to the next. If there is more than one physical log file for a database, the end of the log file grows through each virtual log file in each physical log file before circling back to the first virtual log file in the first physical log file.

The smallest size for a virtual log file is 256 KB. The minimum size for a transaction log is 512 KB, or two virtual log files of 256 KB each. Both the number and size of the virtual log files in a transaction log increase as the size of the log file increases. A small log file might have a very few small virtual log files. A very large log file will have larger virtual log files.

Log files can grow and shrink automatically. If there is no reusable space available, the log file will be extended by adding another logical log file chunk. If more space is needed, another logical log file is added. Internally, the log manager divides physical log files into logical log files. The logical log files are either active or reusable. A logical log file can be backed up if it does not contain any portion of the active log. A log file can be reused if it has been backed up.

Transaction Log Management

Transaction logs assist in recovering database integrity in the event of system failure. Log records for a single database are maintained on one or more operating-system files called log files, which are serial recordings of all modifications that have occurred in the database and of which transaction performed each modification.

A log file grows continuously as logged operations occur in the database. For some large operations, the file records only the fact that the operation took place. The log file records the commit or rollback of each transaction. This allows SQL Server to roll a transaction either back or forward.

Rolling a transaction back occurs when SQL Server is backing out of an incomplete transaction. SQL Server restores the database to the state it was in before the transaction began by reversing the sequence of alterations.

Rolling a transaction forward occurs when SQL Server is restoring a transaction log. Modifications to the database are applied in the same sequence in which they originally occurred. At the end of this process, the database is in the same state it was in at the time the log was backed up.

The SQL Server 7.0 transaction log manager features these improvements:

Does not compete with data for buffer cache pages
Can be spread over one or more physical files
Grows and shrinks automatically
Allows for quick and nonintrusive truncation
Enables larger I/O operations

Memory Management

Microsoft SQL Server 7.0 dynamically acquires and frees memory as needed. It is no longer necessary for an administrator to specify how much memory should be allocated to SQL Server, although the option still exists.

There are several competing subsystems in SQL Server that must share available memory. The log and recovery systems need memory to read and undo pages, and the query processor needs memory to perform hashing and sorting. Other subsystems that use memory are the procedure cache, buffer pool, lock manager, and data structures. In SQL Server 7.0, these systems allocate the memory they need dynamically and can return memory when it is no longer needed.

Microsoft Windows NT, Microsoft Windows 95, and Microsoft Windows 98 operating systems support virtual memory, a method of extending the available physical memory on a computer. In a virtual memory system, the operating system creates a pagefile, or swapfile, and divides memory into units called pages. Recently referenced pages are located in physical memory, or RAM. If a page of memory is not referenced for a while, it is written, or swapped out, to the pagefile. If the page is later referenced by an application, it is read, or swapped into physical memory, from the pagefile. The total memory available to applications is the amount of RAM plus the size of the pagefile.

One of the primary design goals of database software is minimizing disk I/O, because disk reads and writes are among the most resource-intensive operations on a computer. SQL Server builds a buffer cache in memory to hold pages read from the database. Much of the code in SQL Server is dedicated to minimizing the number of physical reads and writes between the disk and the buffer cache. The larger the buffer cache, the less I/O SQL Server has to do. However, if the buffer cache causes the SQL Server memory requirements to exceed the available physical memory on the server, then the operating system starts swapping memory to and from the pagefile. All that has happened is that the physical I/O to the database files has been traded for physical I/O to the swap file.

Having a lot of physical I/O to the database files is an inherent factor of database software. By default, SQL Server attempts to balance between two goals:

Minimizing or eliminating pagefile I/O to concentrate I/O resources for reads and writes of the database files
Minimizing physical I/O to the database files by maximizing the size of the buffer cache

SQL Server starts with a default memory value. As more applications are started, the system is queried periodically to determine the amount of free physical memory available. SQL Server grows or shrinks the buffer cache to keep free physical memory at around 5 MB, which also prevents Windows NT from paging. If there is less than 5 MB of free memory, then SQL Server releases memory to Windows NT that usually goes on the free list. If there is more than 5 MB of free memory, SQL Server recommits memory to the buffer cache. SQL Server adds memory to the buffer cache when its work load requires more memory; a server at rest does not grow its buffer cache.

Buffer Management and I/O

SQL Server 7.0 uses a clocking mechanism to manage buffer I/O instead of a least recently used list. The clocking mechanism improves performance because there is less synchronization required. This improves scaling on large SMP systems, but does not have much effect on small systems. Query performance is improved because it makes use of both parallel I/O and large I/O on the files that comprise the database object.

The SQL Server: Buffer Manager object provides counters to monitor how SQL Server uses memory to store data pages, internal data structures, the procedure cache, and physical I/O as SQL Server reads database pages from and writes database pages to disk.

Monitoring the memory used by SQL Server can help determine, for example, whether bottlenecks exist due to a lack of available physical memory for storing frequently accessed data in cache, in which case SQL Server must retrieve the data from disk. By monitoring memory, you can determine if query performance can be improved by adding more memory or by making more memory available to the data cache or SQL Server internal structures. Monitoring the physical I/O is especially important to determine how often SQL Server needs to read data from the disk.

Compared to other operations, such as memory access, physical I/O consumes a lot of time. Minimizing physical I/O can improve query performance.

Read Ahead

The read requests generated by the system are controlled by the relational engine and are optimized further by the storage engine. The access method used to read pages from a table determines the general pattern of reads performed. The relational engine determines the most effective access method, such as a table scan, an index scan, or a keyed read. This request is then given to the storage engine, which optimizes the reads required to implement the access method. The reads are requested by the thread executing the batch.

The SQL Server 7.0 read-ahead mechanism is improved and significantly simplified. By putting hints and directions in both the query processor and the storage engine, more than 1,000 lines of code were eliminated, as well as configuration parameters.

Table scans benefit from the new data structures introduced in SQL Server 7.0. In a SQL Server 7.0 database, the storage engine can build a serial list of the disk addresses that must be read. This allows SQL Server to optimize its I/O operations as large sequential reads in disk order. SQL Server issues multiple serial read-ahead reads immediately for each file in the scan. This takes advantage of striped disk sets.

The parallel read-ahead mechanism allows the disk subsystem to work at maximum speed. The inputs and outputs are decoupled, and the mechanism pulls in data pages and then passes them to the next available CPU. I/O operations are simultaneously issued against multiple files.

SQL Server 7.0 eliminates separate read-ahead threads and minimizes context switches. The new allocation data structures allow read ahead without following a page chain. The query processor helps with read ahead by using the middle tier of an index to predict the next page of an index scan (including clustered table scans).

SQL Server 7.0 also reads index pages serially in disk order, thereby improving the performance of index scans. Index processing is further improved by the use of prefetch hints to allow serial read-ahead processing of a nonclustered index.