Limiting Result Sets by Using TABLESAMPLE
The TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows. For example:
TABLESAMPLE (10 PERCENT) /*Return a sample 10 percent of the rows of the result set. */ TABLESAMPLE (15 ROWS) /* Return a sample of 15 rows from the result set. */.
TABLESAMPLE cannot be applied to derived tables, tables from linked servers, and tables derived from table-valued functions, rowset functions, or OPENXML. TABLESAMPLE cannot be specified in the definition of a view or an inline table-valued function.
The syntax for the TABLESPACE clause is as follows:
TABLESAMPLE [SYSTEM] (sample_number [ PERCENT | ROWS ] )
[ REPEATABLE (repeat_seed) ]
|When TABLESAMPLE is used against databases that are upgraded to SQL Server 2005, the compatibility level of the database must be set to 90. To set the database compatibility level, see sp_dbcmptlevel (Transact-SQL).|
You can use TABLESAMPLE to quickly return a sample from a large table when either of the following conditions is true:
The sample does not have to be a truly random sample at the level of individual rows.
Rows on individual pages of the table are not correlated with other rows on the same page.
If you really want a random sample of individual rows, modify your query to filter out rows randomly, instead of using TABLESAMPLE. For example, the following query uses the |
SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.
TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify
TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned. The TOP(n) operator can be used to limit the number of rows to a specified maximum.
When a number of rows is specified, instead of a percentage based on the total number of rows in the table, that number is converted into a percentage of the rows and, therefore, pages that should be returned. The TABLESAMPLE operation is then performed with that computed percentage.
If the table is made up of a single page, either all rows on the page are returned or none of the rows are returned. In this case, TABLESAMPLE SYSTEM can only return 100 percent or 0 percent of the rows on a page, regardless of the number of rows on the page.
Using TABLESAMPLE SYSTEM for a specific table limits the execution plan to use a table scan (a scan of the heap, or of the clustered index if one exists) on that table. Although the plan shows a table scan is performed, only those pages that are included in the result set are actually required to be read from the data file.
|The TABLESAMPLE SYSTEM clause should be used with some caution, and with some understanding of some of the implications of using sampling. For example, a join of two tables is likely to return a match for each row in both tables; however, if TABLESAMPLE SYSTEM is specified for either of the two tables, some rows returned from the unsampled table are unlikely to have a matching row in the sampled table. This behavior might lead you to suspect that a data consistency problem exists in the underlying tables, when the data is actually valid. Similarly, if TABLESAMPLE SYSTEM is specified for both tables that are joined, the perceived problem could be even worse.|
The REPEATABLE option causes a selected sample to be returned again. When REPEATABLE is specified with the same repeat_seed value, SQL Server returns the same subset of rows, as long as no changes have been made to the table. When REPEATABLE is specified with a different repeat_seed value, SQL Server will typically return a different sample of the rows in the table. The following actions to the table are considered changes: inserting, updating, deleting, index rebuilding, index defragmenting, restoring a database, and attaching a database.
A. Selecting a percentage of rows
Person.Contact table contains 19,972 rows. The following example returns approximately 10 percent of the rows. The number of rows returned usually changes every time that the statement is executed.
USE AdventureWorks ; GO SELECT FirstName, LastName FROM Person.Contact TABLESAMPLE (10 PERCENT) ;
B. Selecting a percentage of rows with a seed value
The following example returns the same set of rows every time that it is executed. The seed value of
205 was chosen arbitrarily.
USE AdventureWorks ; GO SELECT FirstName, LastName FROM Person.Contact TABLESAMPLE (10 PERCENT) REPEATABLE (205) ;
C. Selecting a number of rows
The following example returns approximately
100 rows. The actual number of rows that are returned can vary significantly. If you specify a small number, such as 5, you might not receive results in the sample.
USE AdventureWorks ; GO SELECT FirstName, LastName FROM Person.Contact TABLESAMPLE (100 ROWS) ;
12 December 2006
14 April 2006