Chapter 1 - Database Design

There are two components to designing a database: logical and physical. Logical database design involves modeling your business requirements and data using database components such as tables and constraints without regard for how or where the data will be physically stored. Physical database design involves mapping the logical design onto physical media, taking advantage of the hardware and software features available that allow the data to be physically accessed and maintained as quickly as possible, and indexing.

It is important to correctly design the database to model your business requirements and to take advantage of hardware and software features early in the development cycle of a database application because it is difficult to make changes to these components later.

Logical Database Design

Using Microsoft SQL Server effectively begins with normalized database design. Normalization is the process of removing redundancies from the data. For example, when converting from an indexed sequence access method (ISAM) style application, normalization often involves breaking data in a single file into two or more logical tables in a relational database. Transact-SQL queries then recombine the table data by using relational join operations. By avoiding the need to update the same data in multiple places, normalization improves the efficiency of an application and reduces the opportunities for introducing errors due to inconsistent data.

However, there are tradeoffs to normalization. A database that is used primarily for decision support (as opposed to update-intensive transaction processing) may not have redundant updates and may be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, unnormalized data is a more common design problem in database applications than over-normalized data. Starting with a normalized design and then selectively denormalizing tables for specific reasons is a good strategy.

For more information, see "Normalization" in Microsoft SQL Server Database Developer's Companion.

Whatever the database design, you should take advantage of these features in SQL Server to automatically maintain the integrity of your data:

  • CHECK constraints ensure that column values are valid. 

  • DEFAULT and NOT NULL constraints avoid the complexities (and opportunities for hidden application bugs) caused by missing column values. 

  • PRIMARY KEY and UNIQUE constraints enforce the uniqueness of rows (and implicitly create an index to do so). 

  • FOREIGN KEY constraints ensure that rows in dependent tables always have a matching master record. 

  • IDENTITY columns efficiently generate unique row identifiers. 

  • TIMESTAMP columns ensure efficient concurrency checking between multiple-user updates. 

  • User-defined data types ensure consistency of column definitions across the database. 

By taking advantage of these features, you can make the data rules visible to all users of the database, rather than hiding them in application logic. These server-enforced rules help avoid errors in the data that can arise from incomplete enforcement of integrity rules by the application itself. Using these facilities also ensures that data integrity is enforced as efficiently as possible.

See Also 

In Other Volumes 

"Data Integrity" in Microsoft SQL Server Database Developer's Companion 

Database Design Considerations: Data Types

Microsoft SQL Server version 6.5 and earlier automatically implemented fixed-length columns that allow null values as variable-length columns of the same data type. For example, a char column that allowed null values was treated as a varchar column. This resulted in less space being used when the length of the value stored in the column was less than the maximum column length. SQL Server version 7.0 treats any fixed-length column that allows null values as fixed-length. Therefore, a char column that allows null values is treated as a fixed-length char column.

As a result, the same data now takes more disk space to store and can require more I/O and other processing operations in SQL Server 7.0 compared to earlier versions of SQL Server. To resolve this issue and achieve the same results as SQL Server 6.5 and earlier, use variable-length columns rather than fixed-length columns. For example, use a varchar data type instead of a char data type. However, if all the values in a column are the same length or the lengths of the values do not vary by much, it is more efficient to use a fixed-length column.

Text Data Types 

Character strings up to 8,000 bytes in length that needed to be stored using the image data type in SQL Server 6.5 and earlier can now be stored in columns defined with the char and varchar data types. Using a char or varchar data type allows the system-defined character string functions, such as SUBSTRING, to be used on character strings up to 8,000 bytes in length.

See Also 

In Other Volumes 

"Specifying a Column Data Type" in Microsoft SQL Server Database Developer's Companion 

Physical Database Design

The I/O subsystem (storage engine) is a key component of any relational database. A successful database implementation usually requires careful planning at the early stages of your project. The storage engine of a relational database requires much of this planning, including:

  • What type of disk hardware to use, such as RAID (redundant array of independent disks) devices. For more information, see "RAID" in this volume.

  • How to place your data onto the disks. For more information, see "Data Placement Using Filegroups" in this volume.

  • Which index design to use to improve query performance in accessing data. For more information, see "Index Tuning Recommendations" in this volume.

  • How to set all configuration parameters appropriately for the database to perform well. For more information, see "Optimizing Server Performance" in this volume.

RAID

RAID (redundant array of independent disks) is a disk system that comprises multiple disk drives (an array) to provide higher performance, reliability, storage capacity, and lower cost. Fault-tolerant arrays are categorized in six RAID levels, 0 through 5. Each level uses a different algorithm to implement fault tolerance.

While RAID is not a part of Microsoft SQL Server, its implementation can directly affect SQL Server's performance. RAID levels 0, 1, and 5 are typically used with SQL Server.

Note RAID is not available on Microsoft Windows 95/98.

A hardware disk array improves I/O performance because I/O functions, such as striping and mirroring, are handled efficiently in firmware. Conversely, an operating system-based RAID offers lower cost but consumes processor cycles. When cost is a consideration and redundancy and high performance are required, Microsoft Windows NT stripe sets with parity are a good solution.

Data striping (RAID 0) is the RAID configuration with the highest performance, but if one disk fails, all the data on the stripe set becomes inaccessible. A common installation technique for relational database management systems is to configure the database on a RAID 0 drive and then place the transaction log on a mirrored drive (RAID 1). You can get the best disk I/O performance for the database and maintain data recoverability (assuming you perform regular database backups) through a mirrored transaction log.

If data must be quickly recoverable, consider mirroring the transaction log and placing the database on a RAID 5 disk. RAID 5 provides redundancy of all data on the array, allowing a single disk to fail and be replaced in most cases without system downtime. RAID 5 offers lower performance than RAID 0 or RAID 1 but higher reliability and faster recovery.

Developing a Drive Performance Strategy

By managing the placement of data on drives, you can both improve performance and implement fault tolerance. In the context of managing drive storage for Microsoft SQL Server installation, performance refers in part to the speed of read and write operations, and fault tolerance refers to the ability of the system to continue functioning without data loss when part of the system fails. You can use the following methods to manage the placement of data on disk drives:

  • Hardware-based RAID (redundant array of independent disks) above level 0 can protect against data loss in the event of media failure, and can improve performance. For more information, see your vendor's documentation. 

  • Both Microsoft Windows NT-based disk striping and striping with parity can improve performance. Disk striping with parity also protects against data loss in the event of media failure. 

  • Windows NT-based disk mirroring and duplexing are both fault-tolerance mechanisms that protect against data loss in the event of media failure. They can also improve read performance. 

Important These fault-tolerance methods do not replace proper backup strategies. You must perform periodic backups to protect your databases and data against catastrophic loss.

For more information about Windows NT disk striping, mirroring, and duplexing, see your Windows NT documentation.

See Also 

In Other Volumes 

"Backing Up and Restoring Databases" in Microsoft SQL Server Administrator's Companion 

RAID Levels and SQL Server

RAID (redundant array of independent disks) levels 0, 1, and 5 are typically implemented with Microsoft SQL Server.

Note RAID levels greater than 10 (1 + 0) offer additional fault tolerance or performance enhancements. These tend to be proprietary systems. For more information about these types of RAID systems, contact the hardware vendor.

Level 0 

This level is also known as disk striping because of its use of a disk file system called a stripe set. Data is divided into blocks and spread in a fixed order among all disks in an array. RAID 0 improves read/write performance by spreading operations across multiple disks, so that operations can be performed independently and simultaneously.RAID 0 is similar to RAID 5, but RAID 5 also provides fault tolerance. Cc917578.raid0(en-us,TechNet.10).gif

Level 1 

This level is also known as disk mirroring* *because of its use of a disk file system called a mirror set. Disk mirroring provides a redundant, identical copy of a selected disk. All data written to the primary disk is written to the mirror disk. RAID 1 provides fault tolerance and generally improves read performance (but may degrade write performance).Cc917578.raid1(en-us,TechNet.10).gif

Level 2 

This level adds redundancy by using an error correction method that spreads parity across all disks. It also employs a disk-striping strategy that breaks a file into bytes and spreads it across multiple disks. This strategy offers only a marginal improvement in disk utilization and read/write performance over mirroring (RAID 1). RAID 2 is not as efficient as other RAID levels and is not generally used.

Level 3 

This level uses the same striping method as RAID 2, but the error correction method requires only one disk for parity data. Use of disk space varies with the number of data disks. RAID 3 provides some read/write performance improvement.

Level 4 

This level employs striped data in much larger blocks or segments than RAID 2 or RAID 3. Like RAID 3, the error correction method requires only one disk for parity data. It keeps user data separate from error-correction data. RAID 4 is not as efficient as other RAID levels and is not generally used.

Level 5 

Also known as striping with parity, this level is the most popular strategy for new designs. It is similar to RAID 4 in that it stripes the data in large blocks across the disks in an array. It differs in that it writes the parity across all the disks. Data redundancy is provided by the parity information. The data and parity information are arranged on the disk array so that the two are always on different disks. Striping with parity offers better performance than disk mirroring (RAID 1). However, when a stripe member is missing, read performance degrades (for example, when a disk fails).Cc917578.raid5(en-us,TechNet.10).gif

Level 10 (1+0) 

This level is also known as mirroring with striping. This level uses a striped array of disks, which are then mirrored to another identical set of striped disks. For example, a striped array can be created using five disks. The striped array of disks is then mirrored using another set of five striped disks. RAID 10 provides the performance benefits of disk striping with the disk redundancy of mirroring. RAID 10 provides the highest read/write performance of any of the RAID levels at the expense of using twice as many disks.

Comparing Different Implementations of RAID Levels

There are advantages and disadvantages to using the various implementations of RAID (redundant array of independent disks).

RAID implementation

Advantage

Disadvantage

Microsoft Windows NT-based striping

No added hardware cost.

Uses system processing resources.

Hardware-based striping

Does not compete for processor cycles. Best performance of all RAID implementations.

Additional cost of specialized hardware.

RAID solutions typically used with Microsoft SQL Server provide varying levels of redundancy and fault tolerance.

RAID implementation

Advantage

Disadvantage

Hardware-based RAID 3, 5, or 10

Excellent performance. Does not compete for processor cycles.

Cost.

Hardware-based RAID 1

Excellent redundancy. Does not compete for processor cycles.

Additional cost due to more hardware.

Hardware-based RAID 10

Excellent performance. Excellent redundancy.

Additional cost due to more hardware.

Windows NT-based RAID 1

Good redundancy. Low cost.

Uses system processing resources.

Windows NT-based RAID 5

Excellent read performance. Low cost.

Uses system processing resources.

About Hardware-based Solutions

RAID (redundant array of independent disks) levels 0, 1, 3, and 5 are the levels typically implemented in hardware-based solutions.

Hardware-based RAID uses an intelligent drive controller and a redundant array of disk drives to protect against data loss in the event of media failure and to improve the performance of read/write operations. A disk array is an effective disk-storage solution for computers running Microsoft SQL Server.

Hardware-based RAID levels 1 through 5 automate redundancy and fault tolerance at the hardware level. All levels (0 through 5) incur no overhead on the system processor. Individual data files are typically spread across more than one disk. It is possible to implement a hardware-based RAID solution that provides your system with seamless, nonstop recovery from media failure.

In general, hardware-based RAID offers performance advantages over Microsoft Windows NT software-based RAID. For example, you can improve data throughput significantly by implementing RAID 5 through hardware that does not use system software resources. This is accomplished by using more disks at a given capacity than in conventional storage solution. Read/write performance and total storage size can be further improved by using multiple controllers.

Depending on the configuration, hardware-based RAID generally provides good performance. It also makes it much easier to manage multiple disks, allowing you to treat an array of disks as one disk. You may even be able to replace a failed drive without shutting down the system. The disadvantages of a hardware-based solution are cost, and it may lock you into a single vendor.

For more information about implementing hardware-based RAID, contact the hardware vendor.

About Windows NT-based Disk Striping and Striping with Parity

Microsoft Windows NT-based disk striping and striping with parity implement RAID features in software, using any hardware compatible with the operating system. Because these are software-based solutions provided with the operating system, they offer a cost advantage.

  • Disk striping writes data in stripes across a volume (created from areas of free space). For more information about volumes, see your Windows NT documentation. 

    These areas are all the same size and are spread over an array of disks (up to 32 disks). Striping writes files across all disks, so data is added to all partitions in the set at the same rate.

    Windows NT-based disk striping implements RAID 0. Disk striping provides the best performance of all Windows NT Server disk-management strategies, but does not provide any fault-tolerance protection.

  • Disk striping with parity is similar to disk striping. Disk striping with parity adds a parity-information stripe to each disk partition in the volume. This provides fault-tolerance protection equivalent to that of disk mirroring, but requires much less space for the redundant data. Windows NT-based disk striping with parity implements RAID 5. 

When a member of a stripe set with parity fails in a severe manner (for example, from a loss of power or a complete head crash), you can regenerate the data for that member of the stripe set from the remaining members.

Stripe sets with parity are a good solution for data redundancy in a computing environment in which most activity consists of reading data. Disk stripe sets with parity also improve write performance, but not as much as striping alone. Creating a disk stripe set with parity requires at least three physical disks on the server.

Disk striping is available on both Windows NT Server and Windows NT Workstation. However, disk striping with parity is supported only for Windows NT Server. On a dual-boot computer, stripe sets, including those with parity, are not accessible when running the Microsoft MS-DOSĀ® operating system.

Disk striping with parity is recommended over mirroring for applications that require redundancy and are read-oriented, although disk striping with parity requires more system memory.

Disk striping and disk striping with parity are set up and managed using the Windows NT Disk Administrator application, which can be started from the Administrative Tools program group.

For more information about setting up disk striping or disk striping with parity, see your Windows NT Server documentation.

About Windows NT-based Disk Mirroring and Duplexing

Microsoft Windows NT-based disk mirroring and duplexing implement RAID (redundant array of independent disks) features in software, using any hardware compatible with the operating system. Because these are software-based solutions provided with the operating system, they offer a cost advantage.

  • Disk mirroring protects against media failure by maintaining a fully redundant copy of a partition on another disk. This provides protection from the downtime and expense involved in recovering lost data and restoring data from a backup storage facility. In a sense, mirroring is continual backup. Mirroring also provides some performance benefits when reading data from disks under heavy I/O loads. Windows NT-based disk mirroring implements RAID 1.

  • Disk duplexing is a form of mirroring that provides protection against controller failures (in addition to protecting against media failures) by using a different disk controller on the mirror disk.

Disk mirroring and duplexing are features of Windows NT Server. They are not supported for Windows NT Workstation. On a dual-boot computer, they are not accessible when running the Microsoft MS-DOS operating system.

Windows NT-based disk mirroring, or duplexing, offers better write performance than Windows NT-based disk striping with parity. It also requires less system memory and does not show performance degradation during a failure.

The entry cost of Windows NT-based disk mirroring or duplexing is lower because it requires only two or more disks (compared to disk striping with parity, which requires three or more disks). However, mirroring provides less usable disk space (compared to disk striping with parity), so the cost per megabyte is higher.

Disk mirroring and duplexing are implemented by using the Windows NT Disk Administrator application, which can be started from the Administrative Tools program group.

For more information about setting up disk mirroring or duplexing, see your Windows NT Server documentation.

Note The term mirroring is frequently used in Windows NT Server documentation to describe both disk mirroring and duplexing.

Partitioning

Partitioning a database improves performance and eases maintenance. By splitting a large table into smaller, individual tables, queries accessing only a fraction of the data can run faster because there is less data to scan. Maintenance tasks, such as rebuilding indexes or backing up a table, can execute more quickly.

Partitioning can be achieved without splitting tables by physically placing them on individual disk drives. Placing a table on one physical drive and related tables on a separate drive, for example, can improve query performance because when queries involving joins between the tables are executed, multiple disk heads read data at the same time. Microsoft SQL Server filegroups can be used to specify on which disks to place the tables.

Hardware Partitioning 

Hardware partitioning designs the database to take advantage of the available hardware architecture. Examples of hardware partitioning include:

  • Multiprocessors that allow multiple threads of execution, permitting many queries to execute at the same time. Alternatively, a single query may be able to run faster on multiple processors by parallelizing. For example, each table referenced in the query can be scanned at the same time by a different thread. 

  • RAID (redundant array of independent disks) devices that allow data to be striped across multiple disk drives, permitting faster access to the data because more read/write heads read data at the same time. A table striped across multiple drives can typically be scanned faster than the same table stored on one drive. Alternatively, storing tables on separate drives from related tables can significantly improve the performance of queries joining those tables. 

Horizontal Partitioning 

Horizontal partitioning segments a table into multiple tables, each containing the same number of columns but fewer rows. For example, a table containing 1 billion rows could be partitioned horizontally into 12 tables, with each smaller table representing one month of data for a given year. Any queries requiring a specific month's data reference the appropriate table only.

Determining how to partition the tables horizontally depends on how data is analyzed. Partition the tables so that queries reference as few tables as possible. Otherwise, excessive UNION queries, used to merge the tables logically at query time, can impair performance. For more information about querying horizontally partitioned tables, see "Scenarios for Using Views" in Microsoft SQL Server Database Developer's Companion.

Partitioning data horizontally based on age/use is common. For example, a table may contain data for the last five years, but only data from the current year is regularly accessed. In this case, you may consider partitioning the data into five tables, with each table containing data from only one year.

Vertical Partitioning 

Vertical partitioning segments a table into multiple tables containing fewer columns. The two types of vertical partitioning are normalization and row splitting.

Normalization is the standard database process of removing redundant columns from a table and placing them in secondary tables linked to the primary table by primary key and foreign key relationships.

Row splitting divides the original table vertically into tables with fewer columns. Each logical row in a split table matches the same logical row in the others. For example, joining the tenth row from each split table re-creates the original row.

Like horizontal partitioning, vertical partitioning allows queries to scan less data, hence increasing query performance. For example, a table containing seven columns, of which only the first four are commonly referenced, may benefit from splitting the last three columns into a separate table.

Vertical partitioning should be considered carefully because analyzing data from multiple partitions requires queries joining the tables, possibly affecting performance if partitions are very large.

See Also 

In Other Volumes 

"Using Views with Partitioned Data" in Microsoft SQL Server Database Developer's Companion 

Data Placement Using Filegroups

Microsoft SQL Server allows you to create tables or indexes on a specific filegroup within your database, rather than across all filegroups in a database. By creating a filegroup on a specific disk or RAID (redundant array of independent disks) device, you can control where tables and indexes in your database are physically located. Reasons for placing tables and indexes on specific disks include:

  • Improved query performance. 

  • Parallel queries. 

See Also 

In Other Volumes 

"Files and Filegroups" in Microsoft SQL Server Database Developer's Companion 

Placing Tables on Filegroups

A table can be created on a specific filegroup rather than the default filegroup. If the filegroup comprises multiple files spread across various physical disks, each with its own disk controller, then queries for data from the table will be spread across the disks, thereby improving performance. The same effect can be accomplished by creating a single file on a RAID (redundant array of independent disks) level 0, 1, or 5 device.

If the computer has multiple processors, Microsoft SQL Server can perform parallel scans of the data. Multiple parallel scans can be executed for a single table if the table's filegroup contains multiple files. Whenever a table is accessed sequentially, a separate thread is created to read each file in parallel. For example, a full scan of a table created on a filegroup comprising of four files will use four separate threads to read the data in parallel. Therefore, creating more files per filegroup can help increase performance because a separate thread is used to scan each file in parallel. Similarly, when a query joins tables on different filegroups, each table can be read in parallel, thereby improving query performance.

Additionally, any text, ntext, or image columns within a table can be created on a filegroup other than the one that contains the base table.

Eventually there is a saturation point when there are too many files and therefore too many parallel threads causing bottlenecks in the disk I/O subsystem. These bottlenecks can be identified by using Windows NT Performance Monitor to monitor the PhysicalDisk object and Disk Queue Length counter. If the Disk Queue Length counter is greater than three, consider reducing the number of files. For more information, see "Monitoring Disk Activity" in Microsoft SQL Server Administrator's Companion. 

It is advantageous to get as much data spread across as many physical drives as possible in order to improve throughput through parallel data access using multiple files. To spread data evenly across all disks, first set up hardware-based disk striping, and then use filegroups to spread data across multiple hardware stripe sets if needed.

See Also 

In This Volume 

Placing Indexes on Filegroups

In Other Volumes 

"Files and Filegroups" in Microsoft SQL Server Database Developer's Companion 

"Using File or Filegroup Backups" in Microsoft SQL Server Administrator's Companion 

Placing Indexes on Filegroups

By default, indexes are created on the same filegroup as the base table on which the index is created. However, it is possible to create nonclustered indexes on a filegroup other than the filegroup of the base table. By creating the index on a different filegroup, you can realize performance gains if the filegroups make use of different physical drives with their own controllers. Data and index information can then be read in parallel by multiple disk heads. For example if Table_A on filegroup f1 and Index_A on filegroup f2 are both being used by the same query, performance gains can be achieved because both filegroups are being fully utilized with no contention. However, if Table_A is scanned by the query but Index_A is not referenced, only filegroup f1 is utilized, resulting in no performance gain.

However, because you cannot predict what type of access will take place and when, it can be a safer decision to spread your tables and indexes across all filegroups. This will guarantee that all disks are being accessed since all data and indexes are spread evenly across all disks, no matter which way the data is accessed. This is also a simpler approach for system administrators.

If there is a clustered index on a table, the data and the clustered index always reside in the same filegroup. Therefore, you can move a table from one filegroup to another by creating a clustered index on the base table that specifies a different filegroup on which to create the index (the index can then be dropped, leaving the base table in the new filegroup).

If a table's indexes span multiple filegroups, all filegroups containing the table and its indexes must be backed up together, after which a transaction log backup must be created. Otherwise, only some of the indexes may be backed up, preventing the index from being recovered if the backup is restored later.

Note An individual table or index can belong to only one filegroup; it cannot span filegroups.

For more information about creating file and filegroup backups, see "Creating File or Filegroup Backups" in Microsoft SQL Server Administrator's Companion.

See Also 

In Other Volumes 

"Files and Filegroups" in Microsoft SQL Server Database Developer's Companion 

Index Tuning Recommendations

Indexes can be dropped, added, and changed without affecting the database schema or application design. Efficient index design is paramount to achieving good performance. For these reasons, you should not hesitate to experiment with different indexes. The Index Tuning Wizard can be used to analyze your queries and suggest the indexes that should be created. For more information, see "Index Tuning Wizard" in Microsoft SQL Server Database Developer's Companion.

The Microsoft SQL Server query optimizer reliably chooses the most effective index in the majority of cases. The overall index design strategy should provide a good selection of indexes to the query optimizer and trust it to make the right decision. This reduces analysis time and results in good performance over a wide variety of situations.

Do not always equate index usage with good performance, and vice-versa. If using an index always produced the best performance, the query optimizer's job would be simple. In reality, incorrect choice of indexed retrieval can result in less than optimal performance. Therefore, the query optimizer's task is to select indexed retrieval only when it will improve performance and to avoid indexed retrieval when it will affect performance.

Recommendations for creating indexes include:

  • Write queries that update as many rows as possible in a single statement, rather than using multiple queries to update the same rows. By using only one statement, optimized index maintenance can be exploited. 

  • Use the Index Tuning Wizard to analyze your queries and make index recommendations. For more information, see "Index Tuning Wizard" in Microsoft SQL Server Database Developer's Companion. 

  • Use integer keys for clustered indexes. Additionally, clustered indexes benefit from being created on unique, nonnull, or IDENTITY columns. For more information, see "Using Clustered Indexes" in Microsoft SQL Server Database Developer's Companion.

  • Create nonclustered indexes on all columns frequently used in queries. This can maximize the use of covered queries. For more information, see "Using Nonclustered Indexes" in Microsoft SQL Server Database Developer's Companion.

    The time taken to physically create an index is largely dependent on the disk subsystem. Important factors to consider are:

    • RAID (redundant array of independent disks) level used to store the database and transaction log files. 

    • Number of disks in the disk array (if RAID was used). 

    • Size of each data row and the number of rows per page. This determines the number of data pages that must be read from disk to create the index. 

    • The columns in the index and the data types used. This determines the number of index pages that have to be written to disk. 

  • Examine column uniqueness. For more information, see "Using Unique Indexes" in Microsoft SQL Server Database Developer's Companion. 

  • Examine data distribution in indexed columns. Often, a long-running query is caused by indexing a column with few unique values, or by performing a join on such a column. This is a fundamental problem with the data and query itself and usually cannot be resolved without identifying this situation. For example, a physical telephone directory sorted alphabetically on last name will not expedite locating a person if all people in the city are named Smith or Jones. 

See Also 

In Other Volumes 

"Designing an Index" in Microsoft SQL Server Database Developer's Companion 

"Statistical Information" in Microsoft SQL Server Database Developer's Companion 

Optimizing Transaction Log Performance

General recommendations for creating transaction log files include:

  • Create the transaction log on a physically separate disk or RAID (redundant array of independent disks) device. The transaction log file is written serially; therefore, using a separate, dedicated disk allows the disk heads to stay in place for the next write operation. 

  • Set the original size of the transaction log file to a reasonable size to prevent the file from automatically expanding as more transaction log space is needed. As the transaction log expands, a new virtual log file is created, and write operations to the transaction log wait while the transaction log is expanded. If the transaction log expands too frequently, performance can be affected. 

  • Set the file growth increment percentage to a reasonable size to prevent the file from growing by too small a value. If the file growth is too small compared to the number of log records being written to the transaction log, then the transaction log may need to expand constantly, affecting performance. 

  • Manually shrink the transaction log files rather than allowing Microsoft SQL Server to shrink the files automatically. Shrinking the transaction log can affect performance on a busy system due to the movement and locking of data pages. 

See Also 

In Other Volumes 

"Transaction Logs" in Microsoft SQL Server Database Developer's Companion 

"Virtual Log Files" in Microsoft SQL Server Database Developer's Companion 

Optimizing tempdb Performance

General recommendations for the physical placement and database options set for the tempdb database include:

  • Allow the tempdb database to automatically expand as needed. This ensures that queries that generate larger than expected intermediate result sets stored in the tempdb database are not terminated before execution is complete. 

  • Set the original size of the tempdb database files to a reasonable size to avoid the files from automatically expanding as more space is needed. If the tempdb database expands too frequently, performance can be affected.

  • Set the file growth increment percentage to a reasonable size to avoid the tempdb database files from growing by too small a value. If the file growth is too small compared to the amount of data being written to the tempdb database, then tempdb may need to constantly expand, thereby affecting performance. 

  • Place the tempdb database on a fast I/O subsystem to ensure good performance. Stripe the tempdb database across multiple disks for better performance. Use filegroups to place the tempdb database on disks different from those used by user databases. 

See Also 

In Other Volumes 

"Expanding the Database" in Microsoft SQL Server Database Developer's Companion 

File Systems

Server performance is not affected by the file system used (FAT or NTFS). Your choice of file system should be determined by factors other than performance.

  • The File Allocation Table (FAT) file system allows dual booting with computers running Microsoft MS-DOS or Microsoft Windows 95/98. 

  • The Microsoft Windows NT file system (NTFS) has security and recovery advantages.

    If you do not need to dual-boot Windows NT with MS-DOS or Windows 95/98, NTFS is recommended. 

Warning Microsoft SQL Server data and transaction log files must not be placed on compressed file systems.

For more information about choosing the appropriate file system, see your operating system documentation.

Note When running on Windows NT, SQL Server performance can be improved further if the databases are created on disks formatted using NTFS and, specifically, 64-KB extent sizes. For more information about formatting an NTFS disk, see your Windows NT documentation.

Cc917578.spacer(en-us,TechNet.10).gif