Chapter 8 - Database Architecture

Article
01/28/2010

Microsoft SQL Server data is stored in databases. The data in a database is organized into the logical components visible to users. A database is also physically implemented as two or more files on disk.

When using a database, you work primarily with the logical components such as tables, views, procedures, and users. The physical implementation of files is largely transparent. Typically, only the database administrator needs to work with the physical implementation.

Cc917544.daart1(en-us,TechNet.10).gif

Each SQL Server installation has multiple databases. SQL Server has four system databases (master, model, tempdb, and msdb) and each SQL Server installation has one or more user databases. Some organizations have only one user database, containing all the data for their organization. Some organizations have different databases for each group in their organization, and sometimes a database used by a single application. For example, an organization could have one database for sales, one for payroll, one for a document management application, and so on. Sometimes an application uses only one database; other applications may access several databases.

Cc917544.daart2(en-us,TechNet.10).gif

It is not necessary to run multiple copies of SQL Server to allow multiple users to access the databases on a server. SQL Server is capable of handling thousands of users working in multiple databases on the same server at the same time. SQL Server makes all databases on the server available to all users that connect to the server, subject to the defined security permissions.

When connecting to SQL Server, your connection is associated with a particular database on the server. This database is called the current database. You are usually connected to a database defined as your default by the system administrator, although you can use connection options in the database APIs to specify another database. You can switch from one database to another with the Transact-SQL USE database_name statement, or you can use an API function that changes your current database context.

SQL Server version 7.0 allows you to detach databases from a server, then reattach them to another server, or even attach the database back to the same server. If you have a SQL Server database file, you can tell SQL Server when you connect that you want that database file attached with a specific database name.

See Also

In Other Volumes

"Database Design Considerations" in Microsoft SQL Server Database Developer's Companion

Logical Database Components

The data in a Microsoft SQL Server database is organized into several different objects. These objects are what a user can see when they connect to the database.

In SQL Server, these components are defined as objects:

Constraints	Tables
Defaults	Triggers
Indexes	User-defined data types
Keys	Views
Stored procedures

Data Types and Table Structures

All the data in Microsoft SQL Server databases is contained in objects called tables. Each table represents some type of object meaningful to the users. For example, in a school database you would expect to find tables such as a class table, an instructor table, and a student table.

SQL Server tables have two main components:

Columns

Each column represents some attribute of the object modeled by the table, such as a parts table having columns for ID, color, and weight.
Rows

Each row represents an individual occurrence of the object modeled by the table. For example, the parts table would have one row for each part carried by the company.

Cc917544.daart3(en-us,TechNet.10).gif

Because each column represents one attribute of an object, the data in each occurrence of the column is similar. One of the properties of a column is called its data type, which defines the type of data the column can hold. SQL Server has several base data types.

binary	bit	char	datetime	decimal
float	image	int	money	nchar
ntext	nvarchar	numeric	real	smalldatetime
smallint	smallmoney	sysname	text	timestamp
tinyint	varbinary	varchar	uniqueidentifier

Users can also create their own user-defined data types, for example:

-- Create a birthday data type that allows nulls.
EXEC sp_addtype birthday, datetime, 'NULL'
GO
-- Create a table using the new data type.
CREATE TABLE employee
(emp_id char(5),
emp_first_name char(30),
emp_last_name char(40),
emp_birthday birthday)

A user-defined data type makes a table structure more meaningful to a reader and helps ensure that columns holding similar classes of data have the same base data type.

A domain is the set of all allowable values in a column. It includes not only the concept of enforcing data types, but also the values allowed in the column. For example, a part color domain would include both the data type, such as char(6), and the character strings allowed in the column, such as Red, Blue, Green, Yellow, Brown, Black, White, Teal, Grey, and Silver. Domain values can be enforced through mechanisms such as CHECK constraints and triggers.

Columns can either accept or reject NULL values. NULL is a special value in databases which represents the concept of an unknown value. NULL is not the same as a blank character or 0. Blank is actually a valid character, and zero is a valid number, while NULL just represents the idea that we do not know what this value is. NULL is also different from a zero length string. If a column definition contains the NOT NULL clause, you cannot insert rows having the value NULL for that row. If the column definition just has the NULL keyword, it accepts NULL values.

Allowing NULL values in a column can increase the complexity of any logical comparisons using the column. The SQL-92 standard states that any comparison against a NULL value does not evaluate to TRUE or FALSE, it evaluates to UNKNOWN. This introduces three value logic to comparison operators, which can be difficult to manage correctly.

SQL Server stores the data defining the configuration of the server and all its tables in a special set of tables known as system tables. Users should not query or update the system tables directly. Only SQL Server should reference the system tables in response to administration commands issued by users. The system tables can change from version to version; applications referencing system tables directly may have to be rewritten before they can be upgraded to a newer version of SQL Server with a different version of the system tables.

SQL Server supports temporary tables. These are tables whose names start with a number sign (#). If a temporary table is not dropped when the user disconnects, it is dropped automatically by SQL Server. Temporary tables are not stored in the current database, they are instead stored in the system database tempdb. There are two types of temporary tables:

Local temporary tables have only one number sign (#) at the start of their name. They are visible only to the connection that created them.
Global temporary tables have a double number sign (##) at the start of their name. They are visible to all connections. If they are not dropped explicitly before the connection that created them disconnects, they are dropped as soon as all other tasks stop referencing them. No new tasks can reference a global temporary table after the connection that created it disconnects. The association between a task and a table is always dropped when the current statement completes executing, so global temporary tables are generally dropped soon after the connection that created them disconnects.

Users work with the data in tables using data manipulation language (DML) SQL statements:

-- Get a list of all employees named Smith:
SELECT emp_first_name, emp_last_name
FROM employee
WHERE emp_last_name = 'Smith'

-- Delete an employee who quit:
DELETE employee
WHERE emp_id = 'OP123'

-- Add a new employee:
INSERT INTO employee
VALUES ( 'OP456', 'Dean', 'Straight', '01/01/1960')

-- Change an employee name:
UPDATE employee
SET emp_last_name = 'Smith'
WHERE emp_id = 'OP456'

See Also

In Other Volumes

"Specifying a Column Data Type" in Microsoft SQL Server Database Developer's Companion

"Tables" in Microsoft SQL Server Database Developer's Companion

SQL Views

A view can be thought of as either a virtual table or a stored query. The data accessible through a view is not stored in the database as a distinct object. What is stored in the database is a SELECT statement. The result set of the SELECT statement forms the virtual table returned by the view. A user can use this virtual table by referencing the view name in Transact-SQL statements the same way a table is referenced. A view is used to do any or all of these functions:

Restrict a user to specific rows in a table.

For example, let an employee see only the rows recording their work in a labor-tracking table.
Restrict a user to specific columns.

For example, let employees who do not work in payroll see the name, office, work phone, and department columns in an employee table, but do not let them see any columns with salary information or personal information.
Join columns from multiple tables so that they look like a single table.
Aggregate information instead of supplying details.

For example, present the sum of a column, or the maximum or minimum value from a column.

Views are created by defining the SELECT statement that retrieves data to be presented by the view. The data tables referenced by the SELECT statement are known as the base tables for the view. titleview in the pubs database is an example of a view that selects data from three base tables to present a virtual table of commonly needed data.

CREATE VIEW titleview
AS
SELECT title, au_ord, au_lname, price, ytd_sales, pub_id
FROM authors AS a
JOIN titleauthor AS ta ON (a.au_id = ta.au_id)
JOIN titles AS t ON (t.title_id = ta.title_id)

You can then reference titleview in statements in the same way you would reference a table.

SELECT *
FROM titleview

A view can reference another view. For example, titleview presents information that is useful for managers, but a company typically only discloses year-to-date figures in quarterly or annual financial statements. A view can be built that selects all the titleview columns except au_ord and ytd_sales. This new view can be used by customers to get lists of available books without seeing the financial information:

CREATE VIEW Cust_titleview
AS
SELECT title, au_lname, price, pub_id
FROM titleview

Views in Microsoft SQL Server are updatable (can be the target of UPDATE, DELETE, or INSERT statements) so long as the modification only affects one of the base tables referenced by the view.

-- Increase the prices for publisher '0736' by 10%.
UPDATE titleview
SET price = price * 1.10
WHERE pub_id = '0736'
GO

See Also

In Other Volumes

"Views" in Microsoft SQL Server Database Developer's Companion

SQL Stored Procedures

A stored procedure is a group of Transact-SQL statements compiled into a single execution plan.

Microsoft SQL Server stored procedures return data in four ways:

Output parameters, which can return either data (such as an integer or character value) or a cursor variable (cursors are result sets that can be retrieved one row at a time).
Return codes, which are always an integer value.
A result set for each SELECT statement contained in the stored procedure or any other stored procedures called by the stored procedure.
A global cursor that can be referenced outside the stored procedure.

Stored procedures assist in achieving a consistent implementation of logic across applications. The SQL statements and logic needed to perform a commonly performed task can be designed, coded, and tested once in a stored procedure. Each application needing to perform that task can then simply execute the stored procedure. Coding business logic into a single stored procedure also offers a single point of control for ensuring that business rules are correctly enforced.

Stored procedures can also improve performance. Many tasks are implemented as a series of SQL statements. Conditional logic applied to the results of the first SQL statements determines which subsequent SQL statements are executed. If these SQL statements and conditional logic are written into a stored procedure, they become part of a single execution plan on the server. The results do not have to be returned to the client to have the conditional logic applied; all of the work is done on the server. The IF statement in this example shows embedding conditional logic in a procedure to keep from sending a result set to the application:

IF (@QuantityOrdered < (SELECT QuantityOnHand
FROM Inventory
WHERE PartID = @PartOrdered) )
BEGIN
-- SQL statements to update tables and process order.
END
ELSE
BEGIN
-- SELECT statement to retrieve the IDs of alternate items
-- to suggest as replacements to the customer.
END

Applications do not need to transmit all of the SQL statements in the procedure: they only have to transmit an EXECUTE or CALL statement containing the name of the procedure and the values of the parameters.

Stored procedures can also shield users from needing to know the details of the tables in the database. If a set of stored procedures supports all of the business functions users need to perform, users never need to access the tables directly; they can just execute the stored procedures that model the business processes with which they are familiar.

An illustration of this use of stored procedures is the SQL Server system stored procedures used to insulate users from the system tables. SQL Server includes a set of system stored procedures whose names usually start with sp_. These system stored procedures support all of the administrative tasks required to run a SQL Server system. You can administer a SQL Server system using the Transact-SQL administration-related statements (such as CREATE TABLE) or the system stored procedures, and never need to directly update the system tables.

In earlier versions of SQL Server, stored procedures were a way to partially precompile an execution plan. At the time the stored procedure was created, a partially compiled execution plan was stored in a system table. Executing a stored procedure was more efficient than executing an SQL statement because SQL Server did not have to compile an execution plan completely, it only had to finish optimizing the stored plan for the procedure. Also, the fully compiled execution plan for the stored procedure was retained in the SQL Server procedure cache, meaning that subsequent executions of the stored procedure could use the precompiled execution plan.

SQL Server version 7.0 introduces a number of changes to statement processing that extend many of the performance benefits of stored procedures to all SQL statements. SQL Server 7.0 does not save a partially compiled plan for stored procedures when they are created. A stored procedure is compiled at execution time like any other Transact-SQL statement. SQL Server 7.0 retains execution plans for all SQL statements in the procedure cache, not just stored procedure execution plans. It uses an efficient algorithm for comparing new Transact-SQL statements with the Transact-SQL statements of existing execution plans. If SQL Server 7.0 determines that a new Transact-SQL statement matches the Transact-SQL statement of an existing execution plan, it reuses the plan. This reduces the relative performance benefit of precompiling stored procedures by extending execution plan reuse to all SQL statements.

SQL Server 7.0 offers new alternatives for processing SQL statements. For more information, see "Query Processor Architecture" in this volume.

SQL Server also supports temporary stored procedures that, like temporary tables, are dropped automatically when you disconnect. Temporary stored procedures are stored in tempdb and are useful when connected to earlier versions of SQL Server. Temporary stored procedures can be used in the case where an application builds dynamic Transact-SQL statements that are executed several times. Rather than have the Transact-SQL statements recompiled each time, you can create a temporary stored procedure that is compiled on the first execution, then execute the precompiled plan multiple times. Heavy use of temporary stored procedures, however, can lead to contention on the system tables in tempdb.

However, two features of SQL Server 7.0 eliminate the need for using temporary stored procedures:

SQL Server 7.0 can reuse execution plans from prior SQL statements. This is especially powerful when coupled with the use of the new system stored procedure sp_executesql.
SQL Server 7.0 natively supports the prepare/execute model of OLE DB and ODBC without using any stored procedures.

For more information about alternatives to using temporary stored procedures, see "Execution Plan Caching and Reuse" in this volume.

This simple stored procedure example illustrates three ways stored procedures can return data:

It first issues a SELECT statement that returns a result set summarizing the order activity for the stores in the sales table.
It then issues a SELECT statement that fills an output parameter.
Finally, it has a RETURN statement with a SELECT statement that returns an integer. Return codes are generally used to pass back error checking information. This procedure runs without errors, so it returns another value to illustrate how returned codes are filled.

USE Northwind
GO
DROP PROCEDURE OrderSummary
GO
CREATE PROCEDURE OrderSummary @MaxQuantity INT OUTPUT AS
-- SELECT to return a result set summarizing
-- employee sales.
SELECT Ord.EmployeeID, SummSales = SUM(OrDet.UnitPrice * OrDet.Quantity)
FROM Orders AS Ord
JOIN [Order Details] AS OrDet ON (Ord.OrderID = OrDet.OrderID)
GROUP BY Ord.EmployeeID
ORDER BY Ord.EmployeeID

-- SELECT to fill the output parameter with the
-- maximum quantity from Order Details.
SELECT @MaxQuantity = MAX(Quantity) FROM [Order Details]

-- Return the number of all items ordered.
RETURN (SELECT SUM(Quantity) FROM [Order Details])
GO

-- Test the stored procedure.

-- DECLARE variables to hold the return code
-- and output parameter.
DECLARE @OrderSum INT
DECLARE @LargestOrder INT

-- Execute the procedure, which returns
-- the result set from the first SELECT.
EXEC @OrderSum = OrderSummary @MaxQuantity = @LargestOrder OUTPUT

-- Use the return code and output parameter.
PRINT 'The size of the largest single order was: ' +
CONVERT(CHAR(6), @LargestOrder)
PRINT 'The sum of the quantities ordered was: ' +
CONVERT(CHAR(6), @OrderSum)
GO

The output from running this sample is:

EmployeeID SummSales 
----------- -------------------------- 
1 202,143.71 
2 177,749.26 
3 213,051.30 
4 250,187.45 
5 75,567.75 
6 78,198.10 
7 141,295.99 
8 133,301.03 
9 82,964.00 
The size of the largest single order was: 130 
The sum of the quantities ordered was: 51317

See Also

In Other Volumes

"Stored Procedures" in Microsoft SQL Server Database Developer's Companion

Constraints, Rules, Defaults, and Triggers

Table columns have properties other than just data type and size. The other properties form an important part of the support for ensuring the integrity of data in a database.

Data integrity refers to each occurrence of a column having a correct data value.

The data values must be of the right data type and in the correct domain.
Referential integrity indicates that the relationships between tables have been properly maintained.

Data in one table should only point to existing rows in another table; it should not point to rows that do not exist.

Objects used to maintain both types of integrity include:

Constraints
Rules
Defaults
Triggers

Constraints

Constraints offer a way to have Microsoft SQL Server enforce the integrity of a database automatically. Constraints define rules regarding the values allowed in columns and are the standard mechanism for enforcing integrity, preferred over triggers, rules, and defaults. They are also used by the query optimizer to improve performance in selectivity estimation, cost calculations, and query rewriting.

There are five classes of constraints.

NOT NULL specifies that the column does not accept NULL values.
CHECK constraints enforce domain integrity by limiting the values that can be placed in a column.

A CHECK constraint specifies a Boolean (evaluates to TRUE or FALSE) search condition that is applied to all values entered for the column; all values that do not evaluate to TRUE are rejected. You can specify multiple CHECK constraints for each column. This sample shows the creation of a named constraint, chk_id, that further enforces the domain of the primary key by ensuring that only numbers within a specified range are entered for the key.
```
CREATE TABLE cust_sample
```

( cust_id int PRIMARY KEY, cust_name char(50), cust_address char(50), cust_credit_limit money, CONSTRAINT chk_id CHECK (cust_id BETWEEN 0 and 10000 ) )

UNIQUE constraints enforce the uniqueness of the values in a set of columns.

No two rows in the table are allowed to have the same nonNULL values for the columns in a UNIQUE constraint. Primary keys also enforce uniqueness, but primary keys do not allow NULL values. A UNIQUE constraint is preferred over a unique index.
PRIMARY KEY constraints identify the column or set of columns whose values uniquely identify a row in a table.

No two rows in a table can have the same primary key value. You cannot enter a NULL value for any column in a primary key. NULL is a special value in databases that represents an unknown value, which is distinct from a blank or 0 value. Using a small, integer column as a primary key is recommended. Each table should have a primary key.

A table may have more than one combination of columns that could uniquely identify the rows in a table; each combination is a candidate key. The database administrator picks one of the candidate keys to be the primary key. For example, in the part_sample table both part_nmbr and part_name could be candidate keys, but only part_nmbr is chosen as a primary key.
```
CREATE TABLE part_sample
```

(part_nmbr int PRIMARY KEY, part_name char(30), part_weight decimal(6,2), part_color char(15) )

FOREIGN KEY constraints identify the relationships between tables.

A foreign key in one table points to a candidate key in another table. You cannot insert a row with a foreign key value (except NULL) if there is no candidate key with that value. You cannot delete a row from the referenced table if there are any foreign key values referencing that candidate key. In the following sample, the order_part table establishes a foreign key referencing the part_sample table defined earlier. Normally, order_part would also have a foreign key against an order table, but this is a simple example.
```
CREATE TABLE order_part
```

(order_nmbr int, part_nmbr int FOREIGN KEY REFERENCES part_sample(part_nmbr), qty_ordered int) GO

Constraints can be column constraints or table constraints:

A column constraint is specified as part of a column definition and applies only to that column (the constraints in the earlier samples are column constraints).
A table constraint is declared independently from a column definition and can apply to more than one column in a table.

Table constraints must be used when more than one column must be included in a constraint.

For example, if a table has two or more columns in the primary key, you must use a table constraint to include both columns in the primary key. Consider a table that records events happening in a machine in a factory. Assume that events of several types can happen at the same time, but that no two events happening at the same time can be of the same type. This can be enforced in the table by including both the type and time columns in a two-column primary key.

CREATE TABLE factory_process
(event_type int,
event_time datetime,
event_site char(50),
event_desc char(1024),
CONSTRAINT event_key PRIMARY KEY (event_type, event_time) )

Rules

Rules are a backward compatibility feature that perform some of the same functions as CHECK constraints. CHECK constraints are the preferred, standard way to restrict the values in a column. CHECK constraints are also more concise than rules; there can only be one rule applied to a column, but multiple CHECK constraints can be applied. CHECK constraints are specified as part of the CREATE TABLE statement, while rules are created as separate objects and then bound to the column.

This example creates a rule that performs the same function as the CHECK constraint example in the preceding topic. The CHECK constraint is the preferred method to use in Microsoft SQL Server.

CREATE RULE id_chk AS @id BETWEEN 0 and 10000
GO
CREATE TABLE cust_sample
(
cust_id int PRIMARY KEY,
cust_name char(50),
cust_address char(50),
cust_credit_limit money,
)
GO
sp_bindrule id_chk, 'cust_sample.cust_id'
GO

Defaults

Defaults specify what values are used in a column if you do not specify a value for the column when inserting a row. Defaults can be anything that evaluates to a constant:

Constant
Built-in function
Mathematical expression

There are two ways to apply defaults:

Create a default definition using the DEFAULT keyword in CREATE TABLE to assign a constant expression as a default on a column.

This is the preferred, standard method. It is also the more concise way to specify a default.
Create a default object using the CREATE DEFAULT statement and bind it to columns using the sp_bindefault system stored procedure.

This is a backward compatibility feature.

This example creates a table using one of each type of default. It creates a default object to assign a default to one column, and binds the default object to the column. It then does a test insert without specifying values for the columns with defaults and retrieves the test row to verify the defaults were applied.

USE pubs
GO
CREATE TABLE test_defaults
(keycol smallint,
process_id smallint DEFAULT @@SPID, --Preferred default definition
date_ins datetime DEFAULT getdate(), --Preferred default definition
mathcol smallint DEFAULT 10 * 2, --Preferred default definition
char1 char(3),
char2 char(3) DEFAULT 'xyz') --Preferred default definition
GO
/* Illustration only, use DEFAULT definitions instead.*/
CREATE DEFAULT abc_const AS 'abc'
GO
sp_bindefault abc_const, 'test_defaults.char1'
GO
INSERT INTO test_defaults(keycol) VALUES (1)
GO
SELECT * FROM test_defaults
GO

The output of this sample is:

Default bound to column.

(1 row(s) affected)

keycol process_id date_ins mathcol char1 char2 
------ ---------- --------------------------- ------- ----- ----- 
1 7 Oct 16 1997 8:34PM 20 abc xyz 

(1 row(s) affected)

Triggers

Triggers are a special class of stored procedure defined to execute automatically when an UPDATE, INSERT, or DELETE statement is issued against a table. Triggers are a powerful tool that allows each site to enforce their business rules automatically when data is modified. Triggers can extend the integrity checking logic of Microsoft SQL Server constraints, defaults, and rules, although constraints and defaults should be used instead whenever they provide all the needed functionality.

Tables can have multiple triggers. The CREATE TRIGGER statement can be defined with the FOR UPDATE, FOR INSERT, or FOR DELETE clauses to target a trigger to a specific class of data modification actions. When FOR UPDATE is specified, the IF UPDATE (column_name) clause can be used to target a trigger to updates affecting a particular column. SQL Server allows you to specify multiple triggers for a specific action (UPDATE, INSERT, or DELETE) on a single table.

Triggers can automate a company's processing. In an inventory system, update triggers can detect when a stock level reaches a reorder point and generate an order to the supplier automatically. In a database recording the processes in a factory, triggers can e-mail or page operators when a process exceeds defined safety limits.

The following trigger generates an e-mail whenever a new title is added in the pubs database.

CREATE TRIGGER reminder
ON titles
FOR INSERT
AS
EXEC master..xp_sendmail 'MaryM',
'New title, mention in the next report to distributors.'

Triggers contain Transact-SQL statements, much the same as stored procedures. Triggers, like stored procedures, return the result set generated by any SELECT statements in the trigger. Including SELECT statements in triggers, except ones that only fill parameters, is not recommended because users do not expect to see any result sets on an UPDATE, INSERT, or DELETE statement.

Triggers execute after the statement that triggered them completes. If the statement fails with an error, such as a constraint violation or syntax error, the trigger is not executed.

See Also

In Other Volumes

"Enforcing Business Rules with Triggers" in Microsoft SQL Server Database Developer's Companion

Code Pages and Sort Orders

The physical storage of character strings in Microsoft SQL Server is controlled by the code page and sort order selected during installation.

While the code page and sort order control the format of how data is stored in each database, the specification of the code page and sort order are global to the server. Every database attached to a specific SQL Server installation uses the same code page and sort order. The code page and sort order are specified during SQL Server Setup.

Code Pages

Software interprets the data in character strings using code pages. The eight bits in one byte can be arranged in 256 different patterns, so code pages that use one byte per character can only support 256 characters. A code page defines which bit patterns in a byte represent each character. Code pages define bit patterns for upper and lowercase characters, digits, symbols, and special characters such as !, @, #, or %.

There are several code pages. The default code page for a server running SQL Server is the 1252 - ISO character set, which contains the common characters from most languages that originated in Western Europe. The bit patterns can be represented by decimal numbers, the first bit pattern is 0, the second 1, and so on to the last bit pattern, which is represented by the value 255. The bit patterns from 32 to 126 represent the same characters on all code pages, while the characters represented by the bit values 0 through 31 and 127 through 255 vary between code pages. The values 32 through 126 cover the characters, digits, and special characters typically used in United States English text. The values 0 through 31 and 127 through 255 are known as the extended character set and represent characters typically used in languages other than U.S. English. Items that vary between countries, such as currency symbols, are also in the extended character set. Systems that store data that does not use any of the extended characters have no concern about what code page is used in the different system components. Systems using extended characters must ensure that all computers use the same code page, or must be able to successfully translate data stored with one code page onto a computer running another code page without losing any of the extended characters.

When you pick a code page during the setup of SQL Server, the data in all character columns in the database, and in all character variables and parameters in Transact-SQL statements, is stored and interpreted using the bit patterns from the indicated code page. SQL Server can be installed with a different code page than the code page used by the server operating system. If the data in a server running SQL Server contains extended characters, care must be used in determining the code pages used in the database and on the clients. If both the database and all the clients are running the same code page, there are no translation issues. If the server is running one code page and clients are using another code page, then the clients may need to turn on options in the SQL Server ODBC driver and OLE DB provider to support the proper conversion of extended characters. In SQL Server version 7.0, the SQL Server ODBC driver, and the OLE DB Provider for SQL Server handle this conversion automatically. You only need to specify it manually when using earlier versions of SQL Server or the SQL Server ODBC driver, or when using DB-Library.

International Data and Unicode

Storing data in multiple languages within one database is difficult to manage when using only character data and code pages. It is difficult to find one code page for the database that can store all the required language-specific characters. It is also difficult to ensure the proper translation of special characters when being read or updated by different clients running various code pages.

For example, a database of customers in North America has to handle three major languages:

Spanish names and addresses for Mexico.
French names and addresses for Quebec.
English names and addresses for the rest of Canada and the United States.

When using only character columns and code pages, care has to be taken to ensure the database is installed with a code page that will handle the characters of all three languages. More care must be taken to ensure the proper translation of characters from one of the languages when read by clients running a code page for another languages.

A new feature in SQL Server 7.0 is support for Unicode data types, which eliminates the problem in converting characters. Unicode stores character data using two bytes for each character rather than one byte. There are 65,536 different bit patterns in two bytes, so Unicode can use one standard set of bit patterns to encode each character in all languages, including languages such as Chinese that have large numbers of characters. Data stored in Unicode columns is not affected by code pages. Programming languages also support Unicode data types, such as the C WCHAR data type. If an application uses Unicode variables to hold the data received from or sent to Unicode columns, there is no need for character translations.

It is easier to manage international databases if you:

Install SQL Server with the default code page.
Implement all char, varchar, and text columns using their Unicode equivalents; nchar, nvarchar, and ntext.

The fact that Unicode data needs twice as much storage space is offset by eliminating the need to convert extended characters between code pages.

SQL Server 7.0 stores all textual system catalog data in columns having Unicode data types. The names of database objects such as tables, views, and stored procedures are stored in Unicode columns. This allows applications to be developed using only Unicode, which avoids all issues with code page conversions.

Sort Order

The sort order is another option specified during the Setup program. The sort order specifies the rules used by SQL Server to collate, compare, and present character data. It also specifies whether SQL Server is case-sensitive.

SQL Server 7.0 uses two sort orders, one for the character code page and one for Unicode data. Both sort orders are specified during setup.

The sort orders are different for each SQL Server code page. For example, the sort orders available with code page 1252 are:

Dictionary order, case-insensitive (default)
Binary order
Dictionary order, case-sensitive
Dictionary order, case-insensitive, uppercase preference
Dictionary order, case-insensitive, accent-insensitive
Danish/Norwegian dictionary order, case-insensitive, uppercase preference
Icelandic dictionary order, case-insensitive, uppercase preference
Swedish/Finnish (standard) dictionary order, case-insensitive, uppercase preference
Swedish/Finnish (phonetic) dictionary order, case-insensitive, uppercase preference

The following table shows how the sort order affects comparisons and sorting.

Sort order	Comparison examples	Sorting example
Dictionary order, case-insensitive	A = a, Ä = ä, Å = å, a ¹ à ¹ á ¹ â ¹ ä ¹ å, A ¹ Ä ¹ Å	A, a, à, á, â, ä, Ä, Å, å (See Note 1)
Binary	(See Note 2)	(See Note 2)
Dictionary order, case-sensitive	A ¹ a, Ä ¹ ä, Å ¹ å, a ¹ à ¹ á ¹ â ¹ ä ¹ å, A ¹ Ä ¹ Å	A, a, à, á, â, Ä, ä, Å, å
Dictionary order, case-insensitive, uppercase preference	A = a, Ä = ä, Å = å, a ¹ à ¹ á ¹ â ¹ ä ¹ å, A ¹ Ä ¹ Å	A, a, à, á, â, Ä, ä, Å, å
Dictionary order, case-insensitive, accent-insensitive	A = a = à = á = â = Ä = ä = Å = å	A, a, à, á, â, Ä, ä, Å, å

1 This is just one example of how the characters might be sorted. Because this sort order does not have uppercase preference, you cannot predict whether an uppercase character will come before or after its corresponding lowercase character.
2 Comparisons and sorting for this sort order are based on the numeric values of the characters in the installed character set.

See Also

In This Volume

Character Set

Sort Order

In Other Volumes

"Advanced Topics" in Microsoft SQL Server Database Developer's Companion

"Unicode Data" in Microsoft SQL Server Database Developer's Companion

"Using Unicode Data" in Microsoft SQL Server Database Developer's Companion

Table Indexes

A Microsoft SQL Server index is a structure associated with a table that speeds retrieval of the rows in the table. An index contains keys built from one or more columns in the table. These keys are stored in a structure that allows SQL Server to find the row or rows associated with the key values quickly and efficiently.

If a table is created with no indexes, the data rows are not stored in any particular order. This structure is called a heap.

The two types of SQL Server indexes are:

Clustered

Clustered indexes sort and store the data rows in the table based on their key values. Because the data rows are stored in sorted order on the clustered index key, clustered indexes are efficient for finding rows. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. The data rows themselves form the lowest level of the clustered index.

The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. If a table has no clustered index, its data rows are stored in a heap.
Nonclustered

Nonclustered indexes have a structure that is completely separate from the data rows. The lowest rows of a nonclustered index contain the nonclustered index key values and each key value entry has pointers to the data rows containing the key value. The data rows are not stored in order based on the nonclustered key.

The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or are clustered. For a heap, a row locator is a pointer to the row. For a table with a clustered index, the row locator is the clustered index key.

The only time the rows in a table are stored in any specific sequence is when a clustered index is created on the table. The rows are then stored in sequence on the clustered index key. If a table only has nonclustered indexes, its data rows are stored in a unordered heap.

Indexes can be unique, which means no two rows can have the same value for the index key. Otherwise, the index is not unique and multiple rows can share the same key value.

There are two ways to define indexes in SQL Server. The CREATE INDEX statement creates and names an index. The CREATE TABLE statement supports the following constraints that create indexes:

PRIMARY KEY creates a unique index to enforce the primary key.
UNIQUE creates a unique index.
CLUSTERED creates a clustered index.
NONCLUSTERED creates a nonclustered index.

A fill factor is a property of a SQL Server index that controls how densely the index is packed when created. The default fill factor usually delivers good performance, but in some cases it may be beneficial to change the fill factor. If the table is going to have many updates and inserts, create an index with a low fill factor to leave more room for future keys. If the table is a read-only table that will not change, create the index with a high fill factor to reduce the physical size of the index, which lowers the number of disk reads SQL Server uses to navigate through the index. Fill factors are only applied when the index is created. As keys are inserted and deleted, the index will eventually stabilize at a certain density.

Indexes not only speed up the retrieval of rows for selects, they also usually increase the speed of updates and deletes. This is because SQL Server must first find a row before it can update or delete the row. The increased efficiency of using the index to locate the row usually offsets the extra overhead needed to update the indexes, unless the table has a lot of indexes.

This example shows the Transact-SQL syntax for creating indexes on a table.

USE pubs
GO
CREATE TABLE emp_sample
(emp_id int PRIMARY KEY CLUSTERED,
emp_name char(50),
emp_address char(50),
emp_title char(25) UNIQUE NONCLUSTERED )
GO
CREATE NONCLUSTERED INDEX sample_nonclust ON emp_sample(emp_name)
GO

Deciding which particular set of indexes will optimize performance depends on the mix of queries in the system. Consider the clustered index on emp_sample.emp_id. This works well if most queries referencing emp_sample have equality or range comparisons on emp_id in their WHERE clauses. If the WHERE clauses of most queries reference emp_name instead of emp_id, performance could be improved by instead making the index on emp_name the clustered index.

Many applications have a complex mix of queries that is difficult to estimate by interviewing users and programmers. SQL Server version 7.0 provides an Index Tuning Wizard to help design indexes in a database. The easiest way to design indexes for large schemas with complex access patterns is to use the Index Tuning Wizard.

You provide the Index Tuning Wizard with a set of SQL statements. This could be a script of statements you build to reflect a typical mix of statements in the system, but it is usually a SQL Server Profiler trace of the actual SQL statements processed on the system during a period of time that reflects the typical load on the system. The Index Tuning Wizard analyzes the workload and the database, then recommends an index configuration that will improve the performance of the workload. You can choose to either replace the existing index configuration, or to keep the existing index configuration and implement new indexes to improve the performance of a slow-running subset of the queries.

See Also

In Other Volumes

"Indexes" in Microsoft SQL Server Database Developer's Companion

Full-text Catalogs and Indexes

A Microsoft SQL Server full-text index provides efficient support for sophisticated word searches in character string data. The full-text index stores information about significant words and their location within a given column. This information is used to quickly complete full-text queries that search for rows with particular words or combinations of words.

Full-text indexes are contained in full-text catalogs. Each database can contain one or more full-text catalogs. A catalog cannot belong to multiple databases and each catalog can contain full-text indexes for one or more tables. A table can only have one full-text index, so each table with a full-text index belongs to only one full-text catalog.

Full-text catalogs and indexes are not stored in the database to which they belong. The catalogs and indexes are managed separately by the Microsoft Search service.

A full-text index must be defined on a base table; it cannot be defined on a view, system table, or temporary table. A full-text index definition includes:

A column that uniquely identifies each row in the table (primary or candidate key) and does not allow NULLs.
One or more character string columns which are covered by the index.

The full-text index is populated with the key values. The entry for each key has information about the significant words (noise-words or stop-words are stripped out) that are associated with the key, the column they are in, and their location in the column.

Transact-SQL has two new predicates for testing rows against a full-text search condition:

CONTAINS
FREETEXT

Transact-SQL also has two new functions that returns the set of rows that match a full-text search condition:

CONTAINSTABLE
FREETEXTTABLE

Internally, SQL Server sends the search condition to the Microsoft Search service. The Microsoft Search service finds all the keys that match the full-text search condition and returns them to SQL Server. SQL Server then uses the list of keys to determine which table rows are to be processed.

See Also

In This Volume

Microsoft Search Service

Full-text Query Architecture

In Other Volumes

"Full-text Querying SQL Server Data" in Microsoft SQL Server Database Developer's Companion

Creating and Maintaining Full-text Indexes

Full-text catalogs and indexes can be defined and managed using:

The SQL Server Enterprise Manager GUI and wizards.
Applications that call the SQL Distributed Management Objects (SQL-DMO) full-text catalog and full-text index objects and methods.
Applications that use Transact-SQL and call system stored procedures for managing full-text catalogs and indexes.

Using Transact-SQL as an example, the steps to define, activate, and populate full-text indexes are:

Install the Microsoft Search service during SQL Server Setup, and then run the service. You must choose the Custom setup option and select the Full-text search subcomponent.
Enable the database to support full-text indexes by calling sp_fulltext_database.
Create each full-text catalog is created by calling sp_fulltext_catalog with the create option.
Associate each table that will have a full-text index with a catalog and its index named by calling sp_fulltext_table with the create option.
Add each column that participates in a full-text index to the index definition by calling sp_fulltext_column with the add option.
Establish the full-text start-seed value for each table by calling sp_fulltext_table with the activate option.
Populate all of the full-text indexes in a catalog at one time by calling sp_fulltext_catalog with the start_full option.

Full-text indexes are not kept up to date automatically as data is modified in the associated tables. Tables that can be updated should have their full-text indexes repopulated at appropriate intervals. The population can be time-consuming, so it is an asynchronous process that is usually run in the background. The population process has to perform sometimes complex linguistic analysis on the source strings; scanning the strings to determine word boundaries (called word-breaking) and eliminating noise-words. Both full and incremental population is supported, and is started by calling sp_fulltext_catalog. It is best to schedule jobs that run periodically to do this.

The data in full-text catalogs and indexes are not recovered by a system recovery. They are also not backed up or restored by the BACKUP and RESTORE statements. After a recovery or restore operation, the full-text catalogs and indexes should be resynchronized with the base tables. Because the metadata defining the full-text catalogs and indexes is stored in the database, they can be repopulated easily after the database has been recovered.

See Also

In Other Volumes

"Full-text Indexes" in Microsoft SQL Server Database Developer's Companion

Logins, Users, Roles, and Groups

Logins, users, roles, and groups are the foundation for the security mechanisms of Microsoft SQL Server. Users that connect to SQL Server must identify themselves using a specific login ID. Users can then only see the tables and views they are authorized to see, and can only execute the stored procedures and administrative functions they are authorized to execute. This system of security is based on the IDs used to identify users.

See Also

In Other Volumes

"Managing Security" in Microsoft SQL Server Administrator's Companion

Logins

Login IDs are associated with users when they connect to Microsoft SQL Server. They are the accounts that control access to the SQL Server system. A user cannot connect to SQL Server without first specifying a valid login ID. Login IDs are defined in SQL Server by members of the sysadmin fixed server role.

sp_grantlogin authorizes a Microsoft Windows NT network account (either a group or a user account) to be used as a SQL Server login for connecting to SQL Server using Windows NT Authentication. sp_addlogin defines a login account for SQL Server connections using SQL Server Authentication.

See Also

In This Volume

Logins

In Other Volumes

"sp_grantlogin" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_addlogin" in Microsoft SQL Server Transact-SQL and Utilities Reference

Users

A user ID identifies a user within a database. All permissions and ownership of objects in the database are controlled by the user account. User accounts are specific to a database; the xyz user account in the sales database is different from the xyz user account in the inventory database, even though both accounts have the same ID. User IDs are defined by members of the db_owner fixed database role.

A login ID by itself does not give a user permissions to access objects in any databases. A login ID must be associated with a user ID in each database before anyone connecting with that login ID can access objects in the databases. If a login ID has not been explicitly associated with any user ID in a database, it is associated with the guest user ID. If a database has no guest user account, a login cannot access the database unless it has been associated with a valid user account.

When a user ID is defined, it is associated with a login ID. For example, a member of the db_owner role can associate the Microsoft Windows NT login NETDOMAIN\Joe with user ID abc in the sales database and user ID def in the employee database. The default is for the login ID and user ID to be the same.

This example shows giving a Windows NT account access to a database and associating the login with a user in the database:

USE master
GO
sp_grantlogin 'NETDOMAIN\Sue'
GO
sp_defaultdb @loginame = 'NETDOMAIN\Sue', defdb = 'sales'
GO
USE sales
GO
sp_grantdbaccess 'NETDOMAIN\Sue', 'Sue'
GO

In the sp_grantlogin statement, the Windows NT user NETDOMAIN\Sue is given access to Microsoft SQL Server The sp_defaultdb statement makes the sales database her default database. The sp_grantdbaccess statement gives the login NETDOMAIN\Sue access to the sales database and sets her user ID within sales to Sue.

This example shows defining a SQL Server login, assigning a default database, and associating the login with a user in the database:

USE master
GO
sp_addlogin @loginame = 'TempWorker', @password = 'fff', defdb = 'sales'
GO
USE sales
GO
sp_grantdbaccess 'TempWorker'
GO

The sp_addlogin statement defines a SQL Server login that will be used by various temporary workers. The statement also specifies the sales database as the default database for the login. The sp_grantdbaccess statement grants the TempWorker login access to the sales database; because no username is specified, it defaults to TempWorker.

A user in a database is identified by the user ID, not the login ID. For example, sa is a login account mapped automatically to the special user account dbo (database owner) in every database. All the security-related statements use the user ID as the security_name parameter. It is less confusing if the members of the sysadmin fixed server role and the db_owner fixed database role set up the system such that the login ID and user ID of each user are the same, but it is not a requirement.

The guest account is a special user account in SQL Server databases. If a user enters a USE database statement to access a database in which they are not associated with a user account, they are instead associated with the guest user.

See Also

In Other Volumes

"guest User" in Microsoft SQL Server Administrator's Companion

"sp_defaultdb" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_grantlogin" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_grantdbaccess" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_addlogin" in Microsoft SQL Server Transact-SQL and Utilities Reference

Roles

Roles are a powerful tool that allow you to collect users into a single unit against which you can apply permissions. Permissions granted to, denied to, or revoked from a role also apply to any members of the role. You can establish a role that represents a job performed by a class of workers in your organization and grant the appropriate permissions to that role. As workers rotate into the job, you simply add them as a member of the role; as they rotate out of the job, remove them from the role. You do not have to repeatedly grant, deny, and revoke permissions to or from each person as they accept or leave the job. The permissions are applied automatically when the users become members of the role.

Windows NT groups can be used in much the same way as roles. For more information, see "Groups" in this volume.

It is easy to manage the permissions in a database if you define a set of roles based on job functions and assign each role the permissions that apply to that job. You can then simply move users between roles rather than having to manage the permissions for each individual user. If the function of a job changes, it is easier to simply change the permissions once for the role and have the changes applied automatically to all members of the role.

In Microsoft SQL Server version 7.0, users can belong to multiple roles.

The following script shows adding a few logins, users, and roles, and granting permissions to the roles.

USE master
GO
sp_grantlogin 'NETDOMAIN\John'
GO
sp_defaultdb 'NETDOMAIN\John', 'courses'
GO
sp_grantlogin 'NETDOMAIN\Sarah'
GO
sp_defaultdb 'NETDOMAIN\Sarah', 'courses'
GO
sp_grantlogin 'NETDOMAIN\Betty'
GO
sp_defaultdb 'NETDOMAIN\Betty', 'courses'
GO
sp_grantlogin 'NETDOMAIN\Ralph'
GO
sp_defaultdb 'NETDOMAIN\Ralph', 'courses'
GO
sp_grantlogin 'NETDOMAIN\Diane'
GO
sp_defaultdb 'NETDOMAIN\Diane', 'courses'
GO
USE courses
GO
sp_grantdbaccess 'NETDOMAIN\John'
GO
sp_grantdbaccess 'NETDOMAIN\Sarah'
GO
sp_grantdbaccess 'NETDOMAIN\Betty'
GO
sp_grantdbaccess 'NETDOMAIN\Ralph'
GO
sp_grantdbaccess 'NETDOMAIN\Diane'
GO
sp_addrole 'Professor'
GO
sp_addrole 'Student'
GO
sp_addrolemember 'Professor', 'NETDOMAIN\John'
GO
sp_addrolemember 'Professor', 'NETDOMAIN\Sarah'
GO
sp_addrolemember 'Professor', 'NETDOMAIN\Diane'
GO
sp_addrolemember 'Student', 'NETDOMAIN\Betty'
GO
sp_addrolemember 'Student', 'NETDOMAIN\Ralph'
GO
sp_addrolemember 'Student', 'NETDOMAIN\Diane'
GO
GRANT SELECT ON StudentGradeView TO Student
GO
GRANT SELECT, UPDATE ON ProfessorGradeView TO Professor
GO

This script gives the professors John and Sarah permission to update students grades, while the students Betty and Ralph can only select their grades. Diane has been added to both roles because she is teaching one class while taking another. The view ProfessorGradeView should restrict professors to the rows for students in their classes, while StudentGradeView should restrict students to selecting only their own grades.

There are several fixed roles defined in SQL Server 7.0 during setup. Users can be added to these roles to pick up the associated administration permissions. These are serverwide roles.

Fixed server role	Description
sysadmin	Can perform any activity in SQL Server
serveradmin	Can set serverwide configuration options, shut down the server
setupadmin	Can manage linked servers and startup procedures
securityadmin	Can manage logins and CREATE DATABASE permissions, also read error logs
processadmin	Can manage processes running in SQL Server
dbcreator	Can create and alter databases
diskadmin	Can manage disk files

You can get a list of the fixed server roles from sp_helpsrvrole, and get the specific permissions for each role from sp_srvrolepermission.

Each database has a set of fixed database roles. While roles with the same names exist in each database, the scope of an individual role is only within a specific database. For example, if Database1 and Database2 both have user IDs named UserX, adding UserX in Database1 to the db_owner fixed database role for Database1 has no effect on whether UserX in Database2 is a member of the db_owner role for Database2.

Fixed database role	Description
db_owner	Has all permissions in the database
db_accessadmin	Can add or remove user IDs
db_securityadmin	Can manage all permissions, object ownerships, roles and role memberships
db_ddladmin	Can issue ALL DDL, but cannot issue GRANT, REVOKE, or DENY statements
db_backupoperator	Can issue DBCC, CHECKPOINT, and BACKUP statements
db_datareader	Can select all data from any user table in the database
db_datawriter	Can modify any data in any user table in the database
db_denydatareader	Can deny or revoke SELECT permissions on any object
db_denydatawriter	Can deny or revoke INSERT, UPDATE, and DELETE permissions on any object

You can get a list of the fixed database roles from sp_helpdbfixedrole, and get the specific permissions for each role from sp_dbfixedrolepermission.

Every user in a database belongs to the public database role. If you want everyone in a database to be able to have a specific permission, assign the permission to the public role. If a user has not been specifically granted permissions on an object, they use the permissions assigned to public.

See Also

In Other Volumes

"Adding a Member to a Predefined Role" in Microsoft SQL Server Administrator's Companion

"sp_dbfixedrolepermission" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_helpdbfixedrole" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_helpsrvrole" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_srvrolepermission" in Microsoft SQL Server Transact-SQL and Utilities Reference

Groups

There are no groups in Microsoft SQL Server version 7.0. The groups from earlier versions of SQL Server have been replaced with roles, which are more powerful. You can, however, manage SQL Server security at the level of an entire Microsoft Windows NT group.

If you use sp_grantlogin and specify the name of a Windows NT group, all members of the group can then connect to SQL Server using Windows NT Authentication.

After the group has been authorized to connect, you can use sp_grantdbaccess to associate the group members with a user ID in each database they need to access. You can use two methods:

Associate the group with a user ID in the database.

In this case, all members of the group will be associated with that user ID when they reference the database.
Associate an individual user account in the Windows NT group with a user ID in the database.

This individual will be associated with the user ID when they reference the database. None of the other individuals in the group will be associated with the user ID. They will be assigned the user ID associated with the group login.

Consider a Windows NT group NETDOMAIN\Managers with three members: NETDOMAIN\Sue, NETDOMAIN\Fred, and NETDOMAIN\Mary. The following Transact-SQL statements add the Windows NT group as both a login and a user in the sales database, and then associate NETDOMAIN\Sue with a specific user ID:

USE master
GO
-- Authorize all members of NETDOMAIN\Managers to connect
-- using Windows NT Authentication.
sp_grantlogin 'NETDOMAIN\Managers'
GO
-- Make sales the default database for all members.
sp_dbdefault 'NETDOMAIN\Managers', 'sales'
USE sales
GO
-- Grant all members of the group access to sales
-- No user ID is specified, so SQL Server creates
-- one named 'NETDOMAIN\Managers'
sp_grantdbaccess 'NETDOMAIN\Managers'
GO
-- Grant a specific member of the group access to
-- sales with a specific user.
sp_grantdbaccess 'NETDOMAIN\Sue', 'Sue'

Permissions can now be granted to either user NETDOMAIN\Managers or user Sue:

USE sales
GO
GRANT SELECT ON SalesTable TO NETDOMAIN\Managers
GO
GRANT UPDATE ON SalesTable to NETDOMAIN\Sue

The permissions applied to NETDOMAIN\Sue are the union of the permissions granted, revoked, or denied to both the NETDOMAIN\Managers or Sue users. Any DENY permission overrides any corresponding GRANT permissions.

Unless their Windows NT account has been associated with a specific user, members of a group are subject to the permissions assigned to the user associated with the group. If a member of the group creates an object, however, the owner name of the object is their Windows NT account name, not the group name. Consider the NETDOMAIN\Manager account. If NETDOMAIN\Fred connects to the sales database, he can see all tables for which NETDOMAIN\Managers has been granted SELECT permission. If NETDOMAIN\Fred executes the following statement, the table is created as sales.NETDOMAIN\Fred.TableX, not sales.NETDOMAIN\Managers.TableX:

CREATE TableX (cola INT PRIMARY KEY, colb CHARACTER(200))

See Also

In Other Volumes

"sp_grantlogin" in Microsoft SQL Server Transact-SQL and Utilities Reference

"sp_grantdbaccess" in Microsoft SQL Server Transact-SQL and Utilities Reference

Owners and Permissions

Every object in Microsoft SQL Server is owned by a user. The owner is identified by a database user ID. When an object is first created, the only user ID that can access the object is the user ID of the owner or creator. For any other user to access the object, the owner must grant permissions to that user. If the owner wants only specific users to access the object, the owner can grant permissions to those specific users.

For tables and views, the owner can grant INSERT, UPDATE, DELETE, SELECT, and REFERENCES permissions, or ALL permissions. A user must have INSERT, UPDATE, DELETE, or SELECT permissions on a table before they can specify it in INSERT, UPDATE, DELETE, or SELECT statements. The REFERENCES permission lets the owner of another table use columns in your table as the target of a REFERENCES FOREIGN KEY constraint from their table. The following example illustrates granting SELECT permissions to a group named Teachers and REFERENCES permissions to another development user:

GRANT SELECT ON MyTable TO Teachers
GRANT REFERENCES (PrimaryKeyCol) ON MyTable to DevUser1

The owner of a stored procedure can grant EXECUTE permissions for the stored procedure. If the owner of a base table wants to prevent users from accessing the table directly, they can grant permissions on views or stored procedures referencing the table, but not grant any permissions on the table itself. This is the foundation of the SQL Server mechanisms to ensure that users do not see data they are not authorized to access.

Users can also be granted statement permissions. Some statements, such as CREATE TABLE and CREATE VIEW, can only be executed by certain users (in this case, the dbo user). If the dbo wants another user to be able to create tables or views, they must grant the permission to execute these statements to that user.

System Databases and Data

Microsoft SQL Server systems have four system databases:

master

The master database records all of the system level information for a SQL Server system. It records all login accounts and all system configuration settings. master is the database that records the existence of all other databases and the location of the primary files that contain the initialization information for the user databases. master records the initialization information for SQL Server; always have a recent backup of master available.
tempdb

tempdb holds all temporary tables and temporary stored procedures. It also fills any other temporary storage needs such as work tables generated by SQL Server. tempdb is a global resource; the temporary tables and stored procedures for all users connected to the system are stored there. tempdb is re-created every time SQL Server is started so the system starts with a clean copy of the database. Because temporary tables and stored procedures are dropped automatically on disconnect, and no connections are active when the system is shut down, there is never anything in tempdb to be saved from one session of SQL Server to another.

tempdb autogrows as needed. Each time the system is started, tempdb is reset to its default size. You can avoid the overhead of having tempdb autogrow by using ALTER DATABASE to increase the size of tempdb.
model

The model database is used as the template for all databases created on a system. When a CREATE DATABASE statement is issued, the first part of the database is created by copying in the contents of the model database, then the remainder of the new database is filled with empty pages. Because tempdb is created every time SQL Server is started, the model database must always exist on a SQL Server system.
msdb

The msdb database is used by SQL Server Agent for scheduling alerts and jobs, and recording operators.

In SQL Server version 7.0, every database, including the system databases, has its own set of files and does not share those files with other databases. The default location for these files is the C:\Mssql7\Data directory.

Database file	Physical file name	Default size, typical setup
master primary data	Master.mdf	7.5 MB
master log	Mastlog.ldf	1.0 MB
tempdb primary data	Tempdb.mdf	8.0 MB
tempdb log	Templog.ldf	0.5 MB
model primary data	Model.mdf	0.75 MB
model log	Modellog.ldf	0.75 MB
msdb primary data	Msdbdata.mdf	3.5 MB
msdb log	Msdblog.ldf	0.75 MB

In earlier versions of SQL Server, the master and model system databases were on a single file, known as the master device. The first 2 MB allocation of tempdb also resided on the master device, and sometimes the pubs sample database. The restriction of having these databases all reside in a single file sometimes caused problems with space in the master and model databases. In SQL Server 7.0, all of these databases have their own set of files that can grow independently of each other.

Each database in SQL Server contains system tables recording the data needed by the SQL Server components. The successful operation of SQL Server depends on the integrity of information in the system tables; therefore, Microsoft does not support users directly updating the information in the system tables.

Microsoft provides a complete set of administrative tools that allow users to fully administer their system and manage all users and objects in a database. Users can use the administration utilities, such as SQL Server Enterprise Manager, to directly manage the system. Programmers can use the SQL-DMO API to include complete functionality for administering SQL Server in their applications. Programmers building Transact-SQL scripts and stored procedures can use the system stored procedures and Transact-SQL DDL statements to support all administrative functions in their systems.

An important function of SQL-DMO, system stored procedures, and DDL statements is to shield applications from changes in the system tables. Microsoft sometimes needs to change the system tables in new versions of SQL Server to support new functionality being added in that version. Applications issuing SELECT statements that directly reference system tables are frequently dependent on the old format of the system tables. Sites may not be able to upgrade to a new version of SQL Server until they have rewritten applications that are selecting from system tables. Microsoft considers the system stored procedures, DDL, and SQL-DMO published interfaces, and seeks to maintain the backward compatibility of these interfaces.

Microsoft does not support triggers defined on the system tables; they may alter the operation of the system.

Another important tool for querying the SQL Server catalog is the set of Information Schema Views. These views comply with the information schema defined in the SQL-92 standard. These views provide applications a standards-based component for querying the SQL Server catalog.

Physical Database Architecture

Microsoft SQL Server version 7.0 introduces significant improvements in the way data is stored physically. These changes are largely transparent to general SQL Server users, but do affect the setup and administration of SQL Server databases.

This topic describes how SQL Server 7.0 files and databases are organized. The SQL Server 7.0 organization is different from the organization of data in earlier versions of SQL Server. Descriptions of the physical database architecture of prior versions are in Inside Microsoft SQL Server 6.5 and the Microsoft SQL Server Resource Guide in version 2.0 of the Microsoft BackOffice Resource Kit, Part 2.

Pages and Extents

The fundamental unit of data storage in Microsoft SQL Server is the page. In SQL Server version 7.0, the size of pages is 8 KB. This means SQL Server 7.0 databases have 128 pages per megabyte.

The start of each page is a 96 byte header used to store system information such as the type of page, the amount of free space on the page, and the object ID of the object owning the page.

There are six types of pages in the data files of a SQL Server 7.0 database.

Page type	Contents
Data	Data rows with all data except text, ntext, and image data
Index	Index entries
Text/Image	text, ntext, and image data
Global Allocation Map	Information about allocated extents
Page Free Space	Information about free space available on pages
Index Allocation Map	Information about extents used by a table or index

Log files do not contain pages, they contain a series of log records.

Data pages contain all the data in data rows except text, ntext, and image data, which is stored in separate pages. Data rows are placed serially on the page starting immediately after the header. A row offset table starts at the end of the page. The row offset table contains one entry for each row on the page and each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Cc917544.pdart1(en-us,TechNet.10).gif

Rows cannot span pages in SQL Server. In SQL Server 7.0, the maximum amount of data contained in a single row is 8060 bytes, not including text, ntext, and image data.

Extents are the basic unit in which space is allocated to tables and indexes. An extent is 8 contiguous pages, or 64 KB. This means SQL Server 7.0 databases have 16 extents per MB.

To make its space allocation efficient, SQL Server 7.0 does not allocate entire extents to tables with small amounts of data. SQL Server 7.0 has two types of extents:

Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
Mixed extents are shared by up to eight objects.

A new table or index is allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it is switched to uniform extents.

Cc917544.pdart2(en-us,TechNet.10).gif

Physical Database Files and Filegroups

Microsoft SQL Server version 7.0 maps a database over a set of operating system files. Data and log information are never mixed on the same file, and individual files are used only by one database.

SQL Server 7.0 databases have three types of files:

Primary data files

The primary data file is the starting point of the database and points to the rest of the files in the database. Every database has one primary data file. The recommended file extension for primary data files is .mdf.
Secondary data files

Secondary data files comprise all of the data files other than the primary data file. Some databases may not have any secondary data files, while others have multiple secondary data files. The recommended file extension for secondary data files is .ndf.
Log files

Log files hold all of the log information used to recover the database. There must be at least one log file for each database, although there can be more than one. The recommended file extension for log files is .ldf.

SQL Server 7.0 does not enforce the .mdf, .ndf, and .ldf file extensions, but these extensions are recommended to help identify the use of the file.

SQL Server 7.0 files have two names:

logical_file_name is a name used to refer to the file in all Transact-SQL statements.

The logical file name must conform to the rules for SQL Server identifiers and must be unique to the database.
os_file_name is the name of the physical file.

It must follow the rules for Microsoft Windows NT or Microsoft Windows 95/98 file names.

Cc917544.pdart3(en-us,TechNet.10).gif

SQL Server data and log files can be placed on either FAT or NTFS file systems, but cannot be placed on compressed file systems.

Pages in a SQL Server 7.0 file are numbered sequentially starting with 0 for the first page in the file. Each file has a file ID number. Uniquely identifying a page in a database requires both the file ID and page number. The following example shows the page numbers in a database that has a 4-MB primary data file and a 1-MB secondary data file.

Cc917544.pdart4(en-us,TechNet.10).gif

The first page in each file is a file header page containing information about the attributes of the file. The ninth page in a primary data file is a database boot page containing information about the attributes of the database.

SQL Server 7.0 files can grow automatically from their originally specified size. When you define a file, you can specify a growth increment. Each time the file fills, it increases its size by the growth increment. If there are multiple files in a filegroup, they do not autogrow until all the files are full. Growth then occurs using a round-robin algorithm.

Each file can also have a maximum size specified. If a maximum size is not specified, the file can continue to grow until it has used all available space on the disk. This feature is especially useful when SQL Server is used as a database embedded in an application where the user does not have ready access to a system administrator. The user can let the files autogrow as needed to lessen the administrative burden of monitoring the amount of free space in the database and allocating additional space manually.

Database Filegroups

Database files can be grouped together in filegroups for allocation and administration purposes. Some systems can improve their performance by controlling the placement of data and indexes onto specific disk drives. File groups can aid this process. The system administrator can create filegroups for each disk drive, then assign specific tables, indexes, or the text, ntext, or image data from a table, to specific filegroups.

No file can be a member of more than one filegroup. Tables, indexes, and text, ntext, and image data can be associated with a filegroup, in which case all their pages will be allocated in that filegroup.

Log files are never a part of a filegroup. Log space is managed separately from data space.

Files in a filegroup will not autogrow unless there is no space available on any of the files in the filegroup.

There are three types of filegroups:

Primary

The primary filegroup contains the primary data file and any other files not put into another filegroup. All pages for the system tables are allocated in the primary filegroup.
User-defined

User-defined filegroups are any filegroups specified using the FILEGROUP keyword in a CREATE DATABASE or ALTER DATABASE statement.
Default

The default filegroup contains the pages for all tables and indexes that do not have a filegroup specified when they are created. In each database, only one filegroup at a time can be the default filegroup. Members of the db_owner fixed database role can switch the default filegroup from one filegroup to another. If no default filegroup was specified, it defaults to the primary filegroup.

SQL Server 7.0 can work quite effectively without filegroups, so many systems will not need to specify user-defined filegroups. In this case, all files are included in the primary filegroup and SQL Server 7.0 can effectively allocate data within the database. Filegroups are not the only method that can be used to distribute I/O across multiple drives.

Members of the db_owner fixed database role can back up and restore individual files or filegroups instead of backing up or restoring an entire database.

The following example creates a database with a primary data file, a user-defined filegroup, and a log file. The primary data file is in the primary filegroup and the user-defined filegroup has two secondary data files. An ALTER DATABASE statement makes the user-defined filegroup the default. A table is then created specifying the user-defined filegroup.

USE master
GO
-- Create the database with the default data
-- filegroup and the log file. Specify the
-- growth increment and the max size for the
-- primary data file.
CREATE DATABASE MyDB
ON PRIMARY
( NAME='MyDB_Primary',
FILENAME='c:\mssql7\data\MyDB_Prm.mdf',
SIZE=4,
MAXSIZE=10,
FILEGROWTH=1),
FILEGROUP MyDB_FG1
( NAME = 'MyDB_FG1_Dat1',
FILENAME = 'c:\mssql7\data\MyDB_FG1_1.ndf',
SIZE = 1MB,
MAXSIZE=10,
FILEGROWTH=1),
( NAME = 'MyDB_FG1_Dat2',
FILENAME = 'c:\mssql7\data\MyDB_FG1_2.ndf',
SIZE = 1MB,
MAXSIZE=10,
FILEGROWTH=1)
LOG ON
( NAME='MyDB_log',
FILENAME='c:\mssql7\data\MyDB.ldf',
SIZE=1,
MAXSIZE=10,
FILEGROWTH=1)
GO
ALTER DATABASE MyDB 
MODIFY FILEGROUP MyDB_FG1 DEFAULT
GO

-- Create a table in the user-defined filegroup.
USE MyDB
CREATE TABLE MyTable
( cola int PRIMARY KEY,
colb char(8) )
ON MyDB_FG1
GO

Cc917544.pdart5(en-us,TechNet.10).gif

User filegroups can be made read-only. The data cannot be altered, but the catalog can still be modified to allow work such as permissions management.

SQL Server 7.0 databases can be detached from a server and reattached to either another server or the same server. This is especially useful in making databases that are distributed for use on a customer's local SQL Server installation. For example, a company could create a database containing their current product catalog. The company could create this database on a writable compact disc drive and make the database read-only. They could then copy the compact disc and send copies to all of their field sales representatives equipped with a catalog application and SQL Server on Windows 95 laptops. The sales representatives would then have the latest catalog information.

Space Allocation and Reuse

Microsoft SQL Server is effective at quickly allocating pages to objects and reusing space freed up by deleted rows. These operations are internal to the system and use data structures not visible to users, yet these processes and structures are occasionally referenced in SQL Server messages. This topic is an overview of the space allocation algorithms and data structures to give users and administrators the knowledge needed to understand references to the terms in messages generated by SQL Server.

SQL Server version 7.0 introduces some significant changes to the internal data structures used to manage the allocation and reuse of pages. These data structures are not visible to users, so these changes do not affect users other than by improving speed.

SQL Server 7.0 autoshrinks databases that have a large amount of free space. Only those databases where the autoshrink option has been set to true are candidates for this process. The server checks the space usage in each database periodically. If a database is found with a lot of empty space and it has the autoshrink option set to true, SQL Server reduces the size of the files in the database. You can also use SQL Server Enterprise Manager or the DBCC SHRINKDATABASE statement to shrink the files of a database manually.

Whenever SQL Server shrinks a database, it first relocates used pages from the sections being released. You can only shrink a database to the point where it has no free space remaining.

Managing Extent Allocations and Free Space

The Microsoft SQL Server data structures that track free space have a relatively simple structure. This has two benefits:

The free space information is densely packed, so there are relatively few pages containing this information.

This increases speed by reducing the amount of disk reads necessary to retrieve allocation information, and increasing the chance the allocation pages will remain in memory, eliminating even more reads.
Most of the allocation information is not chained together, which simplifies the maintenance of the allocation information.

Each page allocation or deallocation can be performed quickly, decreasing the contention between concurrent tasks needing to allocate or free pages.

SQL Server uses two types of allocation maps to record the allocation of extents:

Global Allocation Map (GAM)

GAM pages record what extents have been allocated. Each GAM covers 64,000 extents, or nearly 4 GB of data. The GAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is free; if the bit is 0, the extent is allocated.
Shared Global Allocation Map (SGAM)

SGAM pages record what extents are currently used as mixed extents and have at least one unused page. Each SGAM covers 64,000 extents, or nearly 4 GB of data. The SGAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is being used as a mixed extent and has free pages; if the bit is 0, the extent is not being used as a mixed extent, or it is a mixed extent whose pages are all in use.

Each extent has the following bit patterns set in the GAM and SGAM based on its current use.

Current use of extent	GAM bit setting	SGAM bit setting
Free, not in use	1	0
Uniform extent, or full mixed extent	0	0
Mixed extent with free pages	0	1

This results in simple extent management algorithms. To allocate a uniform extent, SQL Server searches the GAM for a 1 bit and sets it to 0. To find a mixed extent with free pages, SQL Server searches the SGAM for a 1 bit. To allocate a mixed extent, SQL Server searches the GAM for a 1 bit, sets it to 0, and then also sets the corresponding bit in the SGAM to 1. To free an extent, SQL Server ensures the GAM bit is set to 1 and the SGAM bit is set to 0. The algorithms actually used internally by SQL Server are more sophisticated than what is stated here (SQL Server spreads data evenly around a database), but even the real algorithms are simplified by not having to manage chains of extent allocation information.

Page Free Space (PFS) pages record whether an individual page has been allocated, and the amount of space free on each page. Each PFS page covers 8,000 pages. For each page, the PFS has a bitmap recording whether the page is empty, 1-50% full, 51-80% full, 81-95% full, or 96-100% full.

After an extent has been allocated to an object, then SQL Server uses the PFS pages to record which pages in the extent are allocated or free, and how much free space is available for use. This information is used when SQL Server has to allocate a new page, or when it needs to find a page with free space available to hold a newly inserted row.

A PFS page is the first page after the file header page in a data file (with page number 1). Next comes a GAM (with page number 2) followed by an SGAM (page 3). There is a PFS page each 8,000 pages after the first. There is another GAM each 64,000 extents after the first GAM on page 2, and another SGAM each 64,000 extents after the first SGAM on page 3.

Cc917544.pdart6(en-us,TechNet.10).gif

Managing Space Used by Objects

Index Allocation Map (IAM) pages map the extents in a database file used by a heap or index. Each heap or index has one or more IAM pages recording all the extents allocated to the object. A heap or index has at least one IAM for each file on which it has extents. A heap or index may have more than one IAM on a file if the range of the extents for the heap or index on the file exceeds the range that an IAM can record.

Cc917544.pdart7(en-us,TechNet.10).gif

IAM pages are allocated as needed for each object and are located randomly in the file. sysindexes.dbo.FirstIAM points to the first IAM page for an object, and all the IAM pages for that object are linked in a chain.

Cc917544.pdart8(en-us,TechNet.10).gif

An IAM page has a header indicating the starting extent of the range of extents mapped by the IAM. The IAM also has a large bitmap in which each bit represents one extent. The first bit in the map represents the first extent in the range, the second bit represents the second extent, and so on. If a bit is 0, the extent it represents is not allocated to the object owning the IAM. If the bit is 1, the extent it represents is allocated to the object owning the IAM page.

When Microsoft SQL Server needs to insert a new row and no space is available in the current page, it uses the IAM and PFS pages to find a page with enough space to hold the row. SQL Server uses the IAM pages to find the extents allocated to the object. For each extent, SQL Server searches the PFS pages to see if there is a page with enough free space to hold the row. Each IAM and PFS page covers a large number of data pages, so there are few IAM and PFS pages in a database. This means that the IAM and PFS pages are generally in memory in the SQL Server buffer pool, so they can be searched quickly.

SQL Server allocates a new extent to an object only when it cannot find a page in an existing extent with enough space to hold the row being inserted. SQL Server allocates extents from those available in the filegroup using a proportional allocation algorithm. If a filegroup has two files, one of which has twice the free space of the other, two pages will be allocated from the file with more empty space for every one page allocated from the other file. This means that every file in a filegroup should have a similar percentage of space used.

Table and Index Architecture

Objects in a Microsoft SQL Server version 7.0 database are stored as a collection of 8 KB pages. This topic describes how the pages for tables and indexes are organized.

The data for each table is stored in a collection of 8 KB data pages. Each data page has a 96-byte header containing system information such as the ID of the table that owns the page and pointers to the next and previous pages for pages linked in a list. A row offset table is at the end of the page. Data rows fill the rest of the page.

Cc917544.dbin10(en-us,TechNet.10).gif

SQL Server 7.0 tables use one of two methods to organize their data pages:

Clustered tables are tables that have a clustered index.

The data rows are stored in order based on the clustered index key. The data pages are linked in a doubly-linked list. The index is implemented as a B-tree index structure that supports fast retrieval of the rows based on their clustered index key values.
Heaps are tables that have no clustered index.

The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.

SQL Server also supports up to 249 nonclustered indexes on each table. The nonclustered indexes have a B-tree index structure similar to the one in clustered indexes. The difference is that nonclustered indexes have no effect on the order of the data rows. Clustered tables keep their data rows in order based on the clustered index key. The collection of data pages for a heap is not affected if nonclustered indexes are defined for the table. The data pages remain in a heap unless a clustered index is defined.

The pages holding text, ntext, and image data are managed as a single unit for each table. All of the text, ntext, and image data for a table is stored in one collection of pages.

All of the page collections for tables and indexes are anchored by page pointers in the sysindexes table. Every table has one collection of data pages, plus additional collections of pages to implement each index defined for the table.

Each table and index has a row in sysindexes uniquely identified by the combination of the object identifier (id) column and the index identifier (indid) column. The allocation of pages to table and index is managed by a chain of IAM pages. The column sysindexes.FirstIAM points to first IAM page in the chain of IAM pages managing the space allocated to the table or index.

Each table has a set of rows in sysindexes:

A heap has a row in sysindexes with indid = 0.

The FirstIAM column points to the IAM chain for the collection of data pages for the table. The server uses the IAM pages to find the pages in the data page collection because they are not linked together.
A clustered index has a row in sysindexes with indid = 1.

The root column points to the top of the clustered index b-tree. The server uses the index B-tree to find the data pages.
Each nonclustered index created for the table has a row in sysindexes.

The values for indid in the rows for each nonclustered index range from 2 to 251. The root column points to the top of the nonclustered index B-tree.
Each table that has at least one text, ntext, or image column also has a row in sysindexes with indid = 255.

The column FirstIAM points to the chain of IAM pages that manage the text, ntext, and image pages.

In earlier versions of SQL Server, sysindexes.first always pointed to the start of a heap, the start of the leaf level of an index, or the start of a chain of text and image pages. In SQL Server 7.0, sysindexes.first is largely unused. In earlier versions of SQL Server, sysindexes.root in a row with indid = 0 pointed to the last page in a heap. In SQL Server 7.0, sysindexes.root in a row with indid = 0 is unused.

Distribution Statistics

All indexes have distribution statistics that describe the selectivity and distribution of the key values in the index. Selectivity is a property that relates to how many rows are typically identified by a key value. A unique key has high selectivity, a key value that is found in 1,000 rows has poor selectivity. The selectivity and distribution statistics are used by Microsoft SQL Server to optimize its navigation through tables when processing Transact-SQL statements. The distribution statistics are used to estimate how efficient an index would be in retrieving data associated with a key value or range specified in the query. The statistics for each index are not limited to a single page but are stored as a long string of bits across multiple pages in the same way image data is stored. The column sysindexes.statblob points to this distribution data. You can use the DBCC SHOW_STATISTICS statement to get a report on the distribution statistics for an index.

Distribution statistics may also be maintained for unindexed columns. These can be defined manually using the CREATE STATISTICS statement or created automatically by the query optimizer. Statistics on unindexed columns count against the limit of 249 nonclustered indexes allowed on a table.

To be useful to the optimizer, distribution statistics must be kept reasonably current. The distribution statistics should be refreshed anytime there are significant numbers of changes to keys in the index. Distribution statistics can be updated manually using the UPDATE STATISTICS statement. SQL Server 7.0 can also detect when distribution statistics are out of date and update the statistics automatically. This update is performed by the task that detected that the statistics needed to be updated. The update is performed using a sophisticated sampling method that minimizes the effect of the update on transaction throughput.

See Also

In Other Volumes

"Statistical Information" in Microsoft SQL Server Database Developer's Companion

Heap Structures

Heaps have one row in sysindexes with indid = 0. The column sysindexes.FirstIAM points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap. Microsoft SQL Server uses the IAM pages to navigate through the heap. The data pages and the rows within them are not in any specific order, and are not linked together. The only logical connection between data pages is that recorded in the IAM pages.

Table scans or serial reads of a heap are done by scanning the IAMs to find the extents holding pages for the heap. Because the IAM represents extents in the same order they exist in the file, this means that serial heap scans progress uniformly down the file. A table scan on a heap works as follows:

Reads the first IAM on the first filegroup and scans all the extents on that IAM.
Repeat the process for each IAM for the heap in the file.
Repeat steps 1 and 2 for each file in the database or filegroup until the last IAM for the heap has been processed.

This is more efficient than the data page chains in earlier versions of SQL Server where the data page chain often took a somewhat random path through the files of a database. It also means that the rows are not returned in the order in which they were inserted.

Cc917544.dbin11(en-us,TechNet.10).gif

Clustered Indexes

Clustered indexes have one row in sysindexes with indid = 1. The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point the key value in the inserted row fits in the ordering sequence.

Microsoft SQL Server indexes are organized as B-trees. Each page in an index holds a page header followed by index rows. Each index row contains a key value and a pointer to either a page or a data row. Each page in an index is called an index node. The top node of the B-tree is called the root node. The bottom layer of nodes in the index are called the leaf nodes and are linked together in a doubly-linked list. In a clustered index, the data pages make up the leaf nodes. Any index levels between the root and the leaves are collectively known as intermediate levels.

For a clustered index, sysindexes.root points to the top of the clustered index. SQL Server navigates down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server navigates through the index to find the starting key value in the range, and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

This illustrates the structure of a clustered index.

Cc917544.bokind2(en-us,TechNet.10).gif

Nonclustered Indexes

Nonclustered indexes have the same B-tree structure as clustered indexes, with two significant differences:

The data rows are not sorted and stored in order based on their nonclustered keys.
The leaf layer of a nonclustered index does not consist of the data pages.

Instead, the leaf nodes contain index rows. Each index row contains the nonclustered key value and one or more row locators that point to the data row (or rows if the index is not unique) having the key value.

Nonclustered indexes can be defined on either a table with a clustered index or a heap. In Microsoft SQL Server version 7.0, the row locators in nonclustered index rows have two forms:

If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file ID, page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 7.0 adds an internal value to duplicate keys to make them unique. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

Because nonclustered indexes store clustered index keys as their row locators, it is important to keep clustered index keys as small as possible. Do not choose large columns as the keys to clustered indexes if a table also has nonclustered indexes.

Cc917544.bokind1(en-us,TechNet.10).gif

text, ntext, and image Data

Text, ntext, and image values are not stored as part of the data row but in a separate collection of pages of their own. For each text, ntext, or image value, all that is stored in the data row is a 16-byte pointer. For each row, this pointer points to the location of the text, ntext, or image data. A row containing multiple text, ntext, or image columns has one pointer for each text, ntext, or image column.

Each table has only one collection of pages to hold text, ntext, and image data. The sysindexes row that has indid = 255 is the anchor for the collection. The text, ntext, and image data for all the rows in the table is interleaved in this collection of text and image pages.

In Microsoft SQL Server version 7.0, individual text, ntext, and image pages are not limited to holding data for only one occurrence of a text, ntext, or image column. A text, ntext, or image page can hold data from multiple rows; the page can even have a mix of text, ntext, and image data.

While the user always works with text, ntext, and image data as if it is a single long string of bytes, the data is not stored in that format. The data is stored in a collection of 8 KB pages that are not necessarily located next to each other. In SQL Server 7.0, the pages are organized logically in a B-tree structure, while in earlier versions of SQL Server they were linked together in a page chain. The advantage of the method used by SQL Server 7.0 is that operations starting in the middle of the string are more efficient. SQL Server 7.0 can quickly navigate the B-tree, while older versions of SQL Server had to scan through the page chain. The structure of the B-tree differs slightly depending on whether there is less than 32 KB of data or more.

If there is less than 32 KB of data, the text pointer in the data row points to an 84-byte text root structure. This forms the root node of the B-tree structure. The root node points to the blocks of text, ntext, or image data.

Cc917544.tidart1(en-us,TechNet.10).gif

While the data for text, ntext, and image columns is arranged logically in a B-tree, both the root node and the individual blocks of data are spread throughout the chain of text, ntext, and image pages for the table. They are placed wherever there is space available. The size of each block of data is determined by the size written by an application. Small blocks of data will be combined to fill a page. If there is less than 64 bytes of data, it is all stored in the root structure.

For example, if an application first writes 1 KB of image data, this is stored as the first 1 KB block of image data for the row. If the application then writes 12 KB of image data, then 7 KB is combined with the first 1 KB block so the first block becomes 8 KB. The remaining 5 KB forms the second block of image data. (The actual capacity of each text, ntext, or image page is 8080 bytes of data.)

Cc917544.tidart2(en-us,TechNet.10).gif

Because the blocks of text, ntext, or image data and the root structures can all share space on the same text, ntext, or image pages, SQL Server 7.0 uses less space with small amounts of text, ntext, or image data than earlier versions of SQL Server. For example, if you insert 20 rows that each have 200 bytes of data in a text column, the data and all the root structures can all fit on the same 8 KB page.

If the amount of data for one occurrence of a text, ntext, or image column exceeds 32 KB, then SQL Server starts building intermediate nodes between the data blocks and the root node.

Cc917544.tidart3(en-us,TechNet.10).gif

The root structure and the data blocks are interleaved throughout the text, ntext, or image pages in the same manner as described earlier. The intermediate nodes, however, are stored in pages that are not shared between occurrences of text, ntext, or image columns. A page storing intermediate nodes contains only intermediate nodes for one text, ntext, or image data value in one data row.

Transaction Log Architecture

Every Microsoft SQL Server database has a transaction log that records all transactions and the modifications made by the transactions in the database. This record of transactions and their modifications serves three functions:

Recovery of individual transactions.

If an application issues a ROLLBACK statement, or if SQL Server detects an error such as the loss of communication with a client, the log records are used to roll back the modifications made by an incomplete transaction.
Recovery of all incomplete transactions when SQL Server is started.

If a server running SQL Server fails, the databases may be left in a state where some modifications were never written from the buffer cache to the data files, and there may be some modifications from incomplete transactions in the data files. When a copy of SQL Server is started, it runs a recovery of each database. Every modification recorded in the log which may not have been written to the data files is rolled forward. Every incomplete transaction found in the transaction log is then rolled back to ensure the integrity of the database is preserved.
Rolling a restored database forward to the point of failure.

Each time a full or differential database backup is restored, recovery is run to roll back any incomplete transactions. After a database backup has been restored, transaction log backups can be used to roll forward all completed transactions on the log backup. This allows a database to be restored to the point at which the server failed.

SQL Server version 7.0 introduces several transaction log improvements. The characteristics of the transaction log are:

The transaction log is not implemented as a table but as a separate file or set of files in the database. The log cache is managed separately from the buffer cache for data pages, resulting in simpler, faster, and more robust code within the database engine.
The format of log records and pages is not constrained to follow the format of data pages.
The transaction log can be implemented on several files. The files can be defined to autogrow as needed. This reduces the potential of running out of space in the transaction log, while at the same time reducing administrative overhead.
The mechanism to truncate unused parts of the log is quick and has minimal effect on transaction throughput.

Write-Ahead Transaction Log

Microsoft SQL Server, like many relational databases, uses a write-ahead log. A write-ahead log ensures that no data modifications are written to disk before the associated log record.

SQL Server maintains a buffer cache into which it reads data pages when data must be retrieved. Data modifications are not made directly to disk, but are instead made to the copy of the page in the buffer cache. The modification is not written to disk until the lazywriter process schedules a write for the page. Writing a modified data page from the buffer cache to disk is called flushing the page. A page which has been modified in the cache but is not yet written to disk is called a dirty page.

At the time a modification is made to a page in the buffer, a log record is built in the log cache recording the modification. This log record must be written to disk before the associated dirty page is flushed from the buffer cache to disk. If the dirty page were flushed before the log record, it would create a modification on disk that could not be rolled back if the server failed before the log record were written to disk. SQL Server has logic that prevents a dirty page from being flushed before the associated log record. Because log records are always written ahead of the associated data pages, the log is called a write-ahead log.

See Also

In This Volume

Transactions Architecture

Backup/Restore Architecture

Transaction Log Logical Architecture

The Microsoft SQL Server version 7.0 transaction log operates logically as if it is a serial string of log records. Each log record is identified by a log sequence number (LSN). Each new log record is written to the logical end of the log with an LSN higher than the LSN of the record before it.

In SQL Server 7.0, log records for data modifications record either the logical operation performed or before and after images of the modified data. A before image is a copy of the data before the operation is performed, an after image is a copy of the data after the operation has been performed. The steps to recover an operation depend on the type of log record:

Many different types of operations are recorded in the transaction log, including:

The start and end of each transaction.
Every data modification (insert, update, or delete). This includes changes to system tables made by system stored procedures or Data Definition Language (DDL) statements.
Every extent allocation or deallocation.
The creation or dropping of a table or index.

Log records are stored in a serial sequence as they are created. Each log record is stamped with the ID of the transaction to which it belongs. For each transaction, all log records associated with the transaction are singly-linked in a chain using backward pointers that speed the rollback of the transaction.

Checkpoints and the Active Portion of the Log

Checkpoints minimize the portion of the log that must be processed during a full recovery of a database. During a full recovery, two types of actions must be performed:

The log may contain records of modifications that were not flushed to disk before the system stopped. These modifications must be rolled forward.
All the modifications associated with incomplete transactions (transactions for which there is no COMMIT or ROLLBACK log record) must be rolled back.

Checkpoints flush dirty data pages from the buffer cache of the current database, minimizing the number of modifications that have to be rolled forward during a recovery. To reduce the effect of a checkpoint while a system is running, Microsoft SQL Server version 7.0 does not flush every dirty page in the buffer cache on a checkpoint. Instead, it flushes every dirty page that was dirty at the time of the last checkpoint and which still has not been flushed. Because a dirty page can wait for an extra checkpoint before being forced to disk, the SQL Server 7.0 lazywriter process gains a greater opportunity to flush dirty pages at times of low I/O activity. Few pages have to be written immediately at checkpoints. It also means that SQL Server 7.0 starts the roll forward phase of recovery at the second-to-last checkpoint in the log, not the last checkpoint as in earlier versions of SQL Server.

A SQL Server 7.0 checkpoint performs these processes in the current database:

Writes to disk all dirty data pages that were recorded as dirty at the last checkpoint and have not been flushed since that time.
Writes to the log file a list of all outstanding, active transactions.
Writes to disk all dirty log pages.
Records to the log file a list of all remaining dirty data pages.

Writes the LSN of the first log image at which a system-wide recovery must start processing the log to the database boot page. This LSN is called the Minimum Recovery LSN (MinLSN) and is the minimum of:
- The LSN of the checkpoint.
- The LSN of the oldest recorded dirty data page.
- The LSN of the start of the oldest active transaction.
- The LSN of the start of the oldest outstanding replication transaction.
Stores the information recorded for the checkpoint in a chain of checkpoint log records. The LSN of the start of this chain is written to the database boot page.
Deletes all log records before the new MinLSN, if the trunc. log on chkpt. database option is set on.

The portion of the log file from the MinLSN to the end of the log is called the active portion of the log. This is the portion of the log required to do a full recovery of the database. No part of the active log can ever be truncated. All log truncation must be done from the parts of the log before the MinLSN.

This is a simplified version of the end of a transaction log with two active transactions. The checkpoint records have been compacted to a single record.

Cc917544.tranlog1(en-us,TechNet.10).gif

LSN 148 is the last record in the transaction log. At the time the checkpoint recorded at LSN 147 was processed, Tran 1 had been committed and Tran 2 was the only active transaction. That makes the first log record for Tran 2 the oldest log record for a transaction active at the time of the last checkpoint. This makes LSN 142, the begin transaction record for Tran 2, the MinLSN.

Checkpoints occur:

Whenever a CHECKPOINT statement is executed. The current database for the connection is checkpointed.
When sp_dboption is used to change a database option. sp_dboption checkpoints the database in which the change is made.

When the server is stopped by:
- Executing a SHUTDOWN statement.
- Using the SQL Server Service Control Manager to stop the MSSQLServer service.
Either of these methods checkpoint each database on the server.
When SQL Server periodically generates automatic checkpoints in each database to reduce the amount of time SQL Server would take to recover the database.

Automatic Checkpoints

SQL Server 7.0 always generates automatic checkpoints, regardless of the setting of the trunc. log on chkpt. database option. The only effects of the option on the automatic checkpoints in SQL Server 7.0 is whether the checkpoints truncate the inactive portion of the log, and how the interval between automatic checkpoints is determined.

The interval between automatic checkpoints is based on the number of records in the log, not time. The time interval between automatic checkpoints can be highly variable. The time interval between automatic checkpoints is long if few modifications are made in the database. Automatic checkpoints occur frequently if a lot of data is modified.

The interval between automatic checkpoints is calculated from the recovery interval server configuration option. This option specifies the maximum time SQL Server should use to recover a database during a system restart. SQL Server estimates how many log records it can process in the recovery interval during a recovery operation. The interval between automatic checkpoints also depends on whether the database is in log truncate mode. For more information about log truncate mode, see "Truncating the Transaction Log" in this volume.

If the database is not in log truncate mode, an automatic checkpoint is generated whenever the number of log records reaches the number SQL Server estimates it can process during the time specified in the recovery interval option.

If the database is in log truncate mode, an automatic checkpoint is generated whenever the number of log records reaches the lesser of these two values:
- The log becomes 70 percent full.
- The number of log records reaches the number SQL Server estimates it can process during the time specified in the recovery interval option.

Long-running Transactions

The active portion of the log must include every part of all uncommitted transactions. An application that starts a transaction and does not commit it or roll it back prevents SQL Server from advancing the MinLSN. This can cause two types of problems:

If the system is shut down after the transaction has performed many uncommitted modifications, the recovery phase of the subsequent restart can take considerably longer than the amount of time specified in the recovery interval option.
The log may grow very large, even if the trunc. log on chkpt. database option has been turned on.

See Also

In This Volume

Backup/Restore Architecture

Transaction Recovery

Freeing and Writing Buffer Pages

In Other Volumes

"CHECKPOINT" in Microsoft SQL Server Transact-SQL and Utilities Reference

Truncating the Transaction Log

If log records were never deleted from the transaction log, the log would keep growing until it filled all the available space on the disks holding the log. At some point in time, old log records no longer necessary for recovering or restoring a database must be deleted to make way for new log records. The process of deleting these log records is called truncating the log.

The active portion of the transaction log can never be truncated. The active portion of the log is the part of the log needed to recover the database at any time. It must always be present in the database in case the server fails because it will be required to recover the database when the server is restarted. The record at the start of the active portion of the log is identified by the minimum recovery log sequence number (MinLSN).

The backup and restore process chosen for a database determines how much of the transaction log in front of the active portion must be retained in the database. While the log records in front of the MinLSN play no role in recovery, they are required to roll forward updates when using log backups to restore a database to the point of failure.

The log records before the MinLSN are only needed to maintain a sequence of log backups. If a log backup sequence is not being maintained, all log records before the MinLSN can be truncated at any time. If a log backup sequence is being maintained, the part of the log before the MinLSN cannot be truncated until those log records have been copied to a log backup.

If a database log backup sequence is not being maintained for a database, the database can be set into log truncate mode. The trunc. log on chkpt. database option must be set to TRUE for a database to be eligible for log truncate mode. Whether the database is actually in log truncate mode also depends on the state of the database:

The database comes out of log truncate mode when a BACKUP DATABASE statement is executed. This is because a BACKUP DATABASE statement is the starting point of a log backup sequence.

The database remains out of log truncate mode until an event that invalidates the log backup sequence occurs. These events are:
- A BACKUP LOG statement is executed that references the database and specifies either the NO_LOG or TRUNCATE_ONLY options.
- A nonlogged operation is performed in the database, such as a nonlogged bulk copy operation or a nonlogged WRITETEXT statement.
- An ALTER DATABASE statement that adds or deletes a file in the database is executed.
When in log truncate mode, the database remains in log truncate mode until either a DUMP DATABASE statement is executed or the trunc. log on chkpt. option is set to FALSE.

A log backup sequence cannot be maintained for a database that is in log backup mode. The only BACKUP LOG options that are valid for a database in log backup mode are the NO_LOG or TRUNCATE_ONLY options.

Note The tempdb database is always in log truncate mode. Log truncation always occurs on a checkpoint in tempdb regardless of the setting of the trunc. log on chkpt. option.

Log truncation occurs at these points:

At the completion of a BACKUP LOG statement.
Every time a checkpoint is processed, if the database is in truncate mode. This includes both explicit checkpoints resulting from a CHECKPOINT statement and implicit checkpoints generated by the system. For more information about the interval between automatic checkpoints, see "Checkpoints and the Active Portion of the Log" in this volume.

Transaction logs are divided internally into sections called virtual log files. Virtual log files are the unit of truncation. When a transaction log is truncated, all log records before the start of the virtual log file containing the MinLSN are deleted. For more information about virtual log files, see "Transaction Log Physical Architecture" in this volume.

The size of a transaction log is therefore controlled in one of these ways:

When a log backup sequence is being maintained, schedule BACKUP LOG statements to occur at intervals that will keep the transaction log from growing past the desired size.
When a log backup sequence is not being maintained, either set on the trunc. log on chkpt. database option, or schedule BACKUP LOG WITH NO_LOG or BACKUP LOG WITH TRUNCATE_ONLY statements at intervals frequent enough to keep the log from growing past the desired size.

This illustration shows a transaction log that has four virtual logs. The log has not been truncated after the database was created. The logical log starts at the beginning of the first virtual log and the part of virtual log 4 beyond the end of the logical file has never been used.

Cc917544.tranlog2(en-us,TechNet.10).gif

This illustration shows how the log looks after truncation. The rows before the start of the virtual log containing the MinLSN record have been truncated.

Cc917544.tranlog3(en-us,TechNet.10).gif

Truncation does not physically reduce the size of a log file, it simply marks virtual log files as inactive. Actually deleting portions of the log file is controlled by the DBCC SHRINKDATABASE and DBCC SHRINKFILE statements. These DBCC statements specify that the size of a log file is to be reduced when free space is available in the file. The unit of deletion is a virtual log file. Virtual logs in the active portion of the log cannot be deleted. If all the virtual logs in a log file are in the active portion of the log, the file does not shrink until a truncation marks one or more of the virtual logs as inactive.

See Also

In Other Volumes

"Creating and Applying Transaction Log Backups" in Microsoft SQL Server Administrator's Companion

"BACKUP" in Microsoft SQL Server Transact-SQL and Utilities Reference

"Setting Database Options" in Microsoft SQL Server Database Developer's Companion

"Truncate Method" in Microsoft SQL Server Distributed Management Objects

Transaction Log Physical Architecture

The transaction log in a database maps over one or more physical files. Conceptually, the log file is a serial string of log records. Physically, the sequence of log records must be stored efficiently in the set of physical files that implement the transaction log.

Microsoft SQL Server segments each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file. SQL Server chooses the size of the virtual log files dynamically while creating or extending log files. SQL Server tries to maintain a small number of virtual files. The size of the virtual files after a log file extension is based on the size of the existing log and the size of the new file increment. The size or number of virtual log files cannot be configured or set by administrators; it is determined dynamically by the SQL Server code.

The only time virtual log files affect system performance is if the log files are defined with small size and growth_increment values. If these log files grow to a large size through many small increments, they will have a lot of virtual log files, which can slow down recovery. It is recommended that log files be defined with a size value close to the final size needed, and also have a relatively large growth_increment value.

The transaction log is a wrap-around log file. For example, consider a database with one physical log file divided into four virtual log files. When the database is created, the logical log file begins at the start of the physical log file. New log records are added at the end of the logical log, which grows toward the end of the physical log. As truncation operations occur, the records in the virtual logs before the minimum recovery log sequence number (MinLSN) are deleted. The log in the example database would look like the one in the illustration.

Cc917544.tranlog3(en-us,TechNet.10).gif

When the end of the logical log reaches the end of the physical log file, the new log records wrap around to the start of the physical log file.

Cc917544.tranlog4(en-us,TechNet.10).gif

This cycle repeats endlessly, so long as the end of the logical log never reaches the start of the logical log. If the old log records are truncated often enough to always leave enough room for all the new log records created through the next checkpoint, the log never fills. If the end of the logical log does reach the start of the logical log, however, one of two things happens:

If autogrow is enabled for the log and space is available on the disk, the file is extended by the amount specified in growth_increment and the new log records are added to the extension.
If autogrow is not enabled, or the disk holding the log file has less free space than the amount specified in growth_increment, an 1105 error is generated.

If the log contains multiple physical log files, then the logical log will move through all of the physical log files before it wraps back to the start of the first physical log file.

See Also

In Other Volumes

"Transaction Logs" in Microsoft SQL Server Database Developer's Companion

"Transaction Log Backups" in Microsoft SQL Server Administrator's Companion

Chapter 8 - Database Architecture

Logical Database Components

Data Types and Table Structures

SQL Views

SQL Stored Procedures

Constraints, Rules, Defaults, and Triggers

Constraints

Rules

Defaults

Triggers

Code Pages and Sort Orders

Table Indexes

Full-text Catalogs and Indexes

Creating and Maintaining Full-text Indexes

Logins, Users, Roles, and Groups

Logins

Users

Roles

Groups

Owners and Permissions

System Databases and Data

Physical Database Architecture

Pages and Extents

Physical Database Files and Filegroups

Space Allocation and Reuse

Managing Extent Allocations and Free Space

Managing Space Used by Objects

Table and Index Architecture

Distribution Statistics

Heap Structures

Clustered Indexes

Nonclustered Indexes

text, ntext, and image Data

Transaction Log Architecture

Write-Ahead Transaction Log

Transaction Log Logical Architecture

Checkpoints and the Active Portion of the Log

Truncating the Transaction Log

Transaction Log Physical Architecture

Additional resources