SQL Statement Processing
Processing a single SQL statement is the most basic way that SQL Server executes SQL statements. The steps used to process a single SELECT statement that references only local base tables (no views or remote tables) illustrates the basic process.
A SELECT statement is nonprocedural; it does not state the exact steps that the database server should use to retrieve the requested data. This means that the database server must analyze the statement to determine the most efficient way to extract the requested data. This is referred to as optimizing the SELECT statement. The component that does this is called the query optimizer. The input to the optimizer consists of the query, the database schema (table and index definitions), and the database statistics. The output of the optimizer is a query execution plan, sometimes referred to as a query plan or just a plan. The contents of a query plan are described in more detail later in this topic.
The inputs and outputs of the query optimizer during optimization of a single SELECT statement are illustrated in the following diagram:
A SELECT statement defines only the following:
The format of the result set. This is specified mostly in the select list. However, other clauses such as ORDER BY and GROUP BY also affect the final form of the result set.
The tables that contain the source data. This is specified in the FROM clause.
How the tables are logically related for the purposes of the SELECT statement. This is defined in the join specifications, which may appear in the WHERE clause or in an ON clause following FROM.
The conditions that the rows in the source tables must satisfy to qualify for the SELECT statement. These are specified in the WHERE and HAVING clauses.
A query execution plan is a definition of the following:
The sequence in which the source tables are accessed.
Typically, there are many sequences in which the database server can access the base tables to build the result set. For example, if the SELECT statement references three tables, the database server could first access TableA, use the data from TableA to extract matching rows from TableB, and then use the data from TableB to extract data from TableC. The other sequences in which the database server could access the tables are:
TableC, TableB, TableA, or
TableB, TableA, TableC, or
TableB, TableC, TableA, or
TableC, TableA, TableB
The methods used to extract data from each table.
Generally, there are different methods for accessing the data in each table. If only a few rows with specific key values are required, the database server can use an index. If all the rows in the table are required, the database server can ignore the indexes and perform a table scan. If all the rows in a table are required but there is an index whose key columns are in an ORDER BY, performing an index scan instead of a table scan may save a separate sort of the result set. If a table is very small, table scans may be the most efficient method for almost all access to the table.
The process of selecting one execution plan from potentially many possible plans is referred to as optimization. The query optimizer is one of the most important components of a SQL database system. While some overhead is used by the query optimizer to analyze the query and select a plan, this overhead is typically saved several-fold when the query optimizer picks an efficient execution plan. For example, two construction companies can be given identical blueprints for a house. If one company spends a few days at the beginning to plan how they will build the house, and the other company begins building without planning, the company that takes the time to plan their project will probably finish first.
The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The query optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the query optimizer does not analyze all possible combinations. Instead, it uses complex algorithms to find an execution plan that has a cost reasonably close to the minimum possible cost.
The SQL Server query optimizer does not choose only the execution plan with the lowest resource cost; it chooses the plan that returns results to the user with a reasonable cost in resources and that returns the results the fastest. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.
The query optimizer relies on distribution statistics when it estimates the resource costs of different methods for extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column. For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number (VIN). An index on the VIN is more selective than an index on the manufacturer. If the index statistics are not current, the query optimizer may not make the best choice for the current state of the table. For more information about keeping index statistics current, see Using Statistics to Improve Query Performance.
The query optimizer is important because it enables the database server to adjust dynamically to changing conditions in the database without requiring input from a programmer or database administrator. This enables programmers to focus on describing the final result of the query. They can trust that the query optimizer will build an efficient execution plan for the state of the database every time the statement is run.
The basic steps that SQL Server uses to process a single SELECT statement include the following:
The parser scans the SELECT statement and breaks it into logical units such as keywords, expressions, operators, and identifiers.
A query tree, sometimes referred to as a sequence tree, is built describing the logical steps needed to transform the source data into the format required by the result set.
The query optimizer analyzes different ways the source tables can be accessed. It then selects the series of steps that returns the results fastest while using fewer resources. The query tree is updated to record this exact series of steps. The final, optimized version of the query tree is called the execution plan.
The relational engine starts executing the execution plan. As the steps that require data from the base tables are processed, the relational engine requests that the storage engine pass up data from the rowsets requested from the relational engine.
The relational engine processes the data returned from the storage engine into the format defined for the result set and returns the result set to the client.
The basic steps described for processing a SELECT statement apply to other SQL statements such as INSERT, UPDATE, and DELETE. UPDATE and DELETE statements both have to target the set of rows to be modified or deleted. The process of identifying these rows is the same process used to identify the source rows that contribute to the result set of a SELECT statement. The UPDATE and INSERT statements may both contain embedded SELECT statements that provide the data values to be updated or inserted.
Even Data Definition Language (DDL) statements, such as CREATE PROCEDURE or ALTER TABLE, are ultimately resolved to a series of relational operations on the system catalog tables and sometimes (such as ALTER TABLE ADD COLUMN) against the data tables.