Spatial Indexing Overview
SQL Server 2008 and later versions support spatial data. This includes support for a planar spatial data type, geometry, which supports geometric data—points, lines, and polygons—within a Euclidean coordinate system. The geography data type represents geographic objects on an area on the Earth's surface, such as a spread of land. A spatial index on a geography column maps the geographic data to a two-dimensional, non-Euclidean space.
A spatial index is defined on a table column that contains spatial data (a spatial column). Each spatial index refers to a finite space. For example, an index for a geometry column refers to a user-specified rectangular area on a plane.
In SQL Server 2008, spatial indexes are built using B-trees, which means that the indexes must represent the 2-dimensional spatial data in the linear order of B-trees. Therefore, before reading data into a spatial index, SQL Server 2008 implements a hierarchical uniform decomposition of space. The index-creation process decomposes the space into a four-level grid hierarchy. These levels are referred to as level 1 (the top level), level 2, level 3, and level 4.
Each successive level further decomposes the level above it, so each upper-level cell contains a complete grid at the next level. On a given level, all the grids have the same number of cells along both axes (for example, 4x4 or 8x8), and the cells are all one size.
The following illustration shows the decomposition for the upper-right cell at each level of the grid hierarchy into a 4x4 grid. In reality, all the cells are decomposed in this way. Thus, for example, decomposing a space into four levels of 4x4 grids actually produces a total of 65,536 level-four cells.
The decomposition of space for a spatial index is independent of the unit of measurement that the application data uses.
The cells of a grid hierarchy are numbered in a linear fashion by using a variation of the Hilbert space-filling curve. For the purpose of illustration, however, this discussion uses a simple row-wise numbering, instead of the numbering that is actually produced by the Hilbert curve. In the following illustration, several polygons that represent buildings and lines that represent streets have already been placed into a 4x4, level-1 grid. The level-1 cells are numbered from 1 through 16, starting with the upper-left cell.
The number of cells along the axes of a grid determines its density: the larger the number, the denser the grid. For example, an 8x8 grid (which produces 64 cells), is denser than a 4x4 grid (which produces 16 cells). Grid density is defined on a per-level basis.
The CREATE SPATIAL INDEX Transact-SQL statement supports a GRIDS clause that enables you to specify different grid densities at different levels. The grid density for a given level is specified by using one of the following keywords:
Number of cells
The default is MEDIUM on all levels.
You can control the decomposition process by specifying nondefault grid densities. For example, different grid densities on different levels might be useful for fine tuning an index based on size of the indexed space and the objects in the spatial column.
The grid densities of a spatial index are visible in the level_1_grid, level_2_grid, level_3_grid, and level_4_grid columns of the sys.spatial_index_tessellations catalog view.
After decomposition of an indexed space into a grid hierarchy, the spatial index reads the data from the spatial column, row-by-row. After reading the data for a spatial object (or instance), the spatial index performs a tessellation process for that object. The tessellation processfits the object into the grid hierarchy by associating the object with a set of grid cells that it touches (touched cells). Starting at level 1 of the grid hierarchy, the tessellation process proceeds breadth first across the level. Potentially, the process can continue through all four levels, one level at a time.
The output of the tessellation process is a set of touched cells that are recorded in the spatial index for the object. By referring to these recorded cells, the spatial index can locate the object in space relative to other objects in the spatial column that are also stored in the index.
To limit the number of touched cells that are recorded for an object, the tessellation process applies several tessellation rules. These rules determine the depth of the tessellation process and which of the touched cells are recorded in the index.
These rules are as follows:
The covering rule
If the object completely covers a cell, that cell is said to be covered by the object. A covered cell is counted and is not tessellated. This rule applies at all levels of the grid hierarchy. The covering rule simplifies the tessellation process and reduces the amount of data that a spatial index records.
The cells-per-object rule
This rule enforces the cells-per-object limit, which determines the maximum number of cells that can be counted for each object, except on level 1. At lower levels, the cells-per-object rule controls the amount of information that can be recorded about the object.
The deepest-cell rule
The deepest-cell rule generates the best approximation of an object by recording only the only bottom-most cells that have been tessellated for the object. Parent cells do not contribute to the cells-per-object count, and they are not recorded in the index.
These tessellation rules are applied recursively on each grid level. The rest of this section describes the tessellation rules in more detail.
If an object completely covers a cell, that cell is said to be covered by the object. For example, in the following illustration, one of the second-level cells, 15.11, is completely covered by the middle portion of an octagon.
A covered cell is counted and recorded in the index, and the cell is not tessellated any further.
The extent of tessellation of each object depends primarily on the cells-per-object limit of the spatial index. This limit defines the maximum number of cells that tessellation can count per object. Note, however, that the cells-per-object rule is not enforced for level 1, so it is possible to exceed this limit. If the level-1 count reaches, or exceeds, the cells-per-object limit, no further tessellation occurs at the lower levels.
As long as the count is less than the cells-per-object limit, the tessellation process continues. Starting with the lowest-number touched cell (for example, cell 15.6 in the preceding illustration), the process tests each cell to evaluate whether to count it or tessellate it. If tessellating a cell would exceed the cells-per-object limit, the cell is counted and not tessellated. Otherwise, the cell is tessellated, and the lower-level cells that are touched by the object are counted. The tessellation process continues in this way, breadth-wise, across the level. This process is repeated recursively for the lower-level grids of the tessellated cells until the limit is reached or there are no more cells to count.
For example, consider the preceding illustration, which shows an octagon that fits completely into cell 15 of the level-1 grid. In the figure, cell 15 has been tessellated, dissecting the octagon into nine level-2 cells. This illustration assumes that the cells-per-object limit is 9 or more. If the cells-per-object limit were 8 or less, however, cell 15 would not be tessellated, and only that cell 15 would be counted for the object.
By default, the cells-per-object limit is 16 cells per object, which provides a satisfactory trade-off between space and precision for most spatial indexes. However, the CREATE SPATIAL INDEX Transact-SQL statement supports a CELLS_PER_OBJECT=n clause that enables you to specify a cells-per-object limit between 1 and 8192, inclusive.
The cells_per_object setting of a spatial index is visible in the sys.spatial_index_tessellations catalog view.
The deepest-cell rule exploits the fact that every lower-level cell belongs to the cell above it: a level-4 cell belongs to a level-3 cell, a level-3 cell belongs to a level-2 cell, and a level-2 cell belongs to a level-1 cell. For example, an object that belongs to cell 22.214.171.124 also belongs to cell 1.1.1, cell 1.1, and cell 1. Knowledge of such cell-hierarchy relationships is built into the query processor. Therefore, only the deepest-level cells need to be recorded in the index, minimizing the information that the index needs to store.
In the following illustration, a relatively small diamond-shaped polygon is tessellated. The index uses the default cells-per-object limit of 16, which is not reached for this small object. Therefore, tessellation continues down to level 4. The polygon resides in the following level-1 through level-3 cells: 4, 4.4, and 4.4.10 and 4.4.14. However, using the deepest-cell rule, the tessellation counts only the twelve level-4 cells: 126.96.36.199-15 and 188.8.131.52-3, 184.108.40.206-7, and 220.127.116.11-11.
The behavior of a spatial index depends partly on its tessellation scheme. The tessellation scheme is data-type specific. In SQL Server 2008, spatial indexes support two tessellation schemes:
Geometry grid tessellation, which is the scheme for the geometry data type.
Geography grid tessellation, which applies to columns of the geography data type.
The tessellation_scheme setting of a spatial index is visible in the sys.spatial_index_tessellations catalog view.
Geometry Grid Tessellation Scheme
Geometry grid tessellation is the default tessellation scheme for the geometry data type, and in SQL Server 2008, it is the only such tessellation scheme. This section discusses aspects of geometry grid tessellation that are relevant to working with spatial indexes: supported methods and bounding boxes.
You can explicitly specify this tessellation scheme by using the USING GEOMETRY_GRID clause of the CREATE SPATIAL INDEX Transact-SQL statement.
Supported Geometry Methods
A spatial index is intended to reduce the cost of applying set-oriented methods to a spatial column by acting as a filter on the objects. The geometry data type provides built-in methods for constructing geometry instances that describe geometric objects and for working with those instances. Under certain conditions, spatial indexes support a number of set-oriented geometry methods, such as STIntersects() and STTouches().
For more information about the support provided by spatial indexes for geometry methods, see Geometry Methods Supported by Spatial Indexes.
The Bounding Box
Geometric data occupies a plane that can be infinite. In SQL Server 2008, however, a spatial index requires a finite space. To establish a finite space for decomposition, the geometry grid tessellation scheme requires a rectangular bounding box. The bounding box is defined by four coordinates, (x-min,y-min) and (x-max,y-max), which are stored as properties of the spatial index. These coordinates represent the following:
x-min is the x-coordinate of the lower-left corner of the bounding box.
y-min is the y-coordinate of the lower-left corner.
x-max is the x-coordinate of the upper-right corner.
y-max is the y-coordinate of upper-right corner.
These coordinates are specified by the BOUNDING_BOX clause of the CREATE SPATIAL INDEX Transact-SQL statement.
The (x-min,y-min) and (x-max,y-max) coordinates determine the placement and dimensions of the bounding box. The space outside of the bounding box is treated as a single cell that is numbered 0.
The spatial index decomposes the space inside the bounding box. The level-1 grid of the grid hierarchy fills the bounding box. To place a geometric object in the grid hierarchy, the spatial index compares the coordinates of the object to the bounding-box coordinates.
The following illustration shows the points defined by the (x-min,y-min) and (x-max,y-max) coordinates of the bounding box. The top-level of the grid hierarchy is shown as a 4x4 grid. For the purpose of illustration, the lower levels are omitted. The space outside of the bounding box is indicated by a zero (0). Note that object 'A' extends partly beyond the box, and object 'B' lies completely outside the box in cell 0.
A bounding box corresponds to some portion of an application's spatial data. Whether the bounding-box of the index completely contains the data stored in the spatial column, or only contains a portion, is up to the application. Only operations computed on objects that are entirely inside of the bounding box benefit from the spatial index. Therefore, to gain the greatest advantage from a spatial index on a geometry column, you need to specify a bounding-box that contains all or most of the objects.
The grid densities of a spatial index are visible in the bounding_box_xmin, bounding_box_ymin, bounding_box_xmax, and bounding_box_ymax columns of the sys.spatial_index_tessellations catalog view.
The Geography Grid Tessellation Scheme
This tessellation scheme applies only to a geography column. This section summarizes the methods that are supported by geography grid tessellation and discusses how geodetic space is projected onto a plane, which is then decomposed into a grid hierarchy.
You can explicitly specify this tessellation scheme by using the USING GEOGRAPHY_GRID clause of the CREATE SPATIAL INDEX Transact-SQL statement.
Supported Geography Methods
The geography data type provides built-in methods for constructing and manipulating geography instances that describe geographic objects. Under certain conditions, spatial indexes support the following set-oriented geography methods: STIntersects(), STEquals(), and STDistance(). A spatial index on a geography data type column filters the objects and reduces the performance and query cost of applying these methods to the spatial data.
For more information about the support provided by spatial indexes for geography methods, see Geography Methods Supported by Spatial Indexes.
Projection of the Geodetic Space onto a Plane
Computations on geography instances (objects) treat the space containing the objects as a geodetic ellipsoid. To decompose this space, the geography grid tessellation scheme divides the surface of the ellipsoid into its upper and lower hemispheres and then performs the following steps:
Projects each hemisphere onto the facets of a quadrilateral pyramid.
Flattens the two pyramids.
Joins the flattened pyramids to form a non-Euclidean plane.
The following illustration shows a schematic view of the three-step decomposition process. In the pyramids, the dotted lines represent the boundaries of the four facets of each pyramid. Steps 1 and 2 illustrate the geodetic ellipsoid, using a green horizontal line to represent the equatorial longitude line and a series of green vertical lines to represent several latitude lines. Step 1 shows the pyramids being projected over the two hemispheres. Step 2 shows the pyramids being flattened. Step 3 illustrates the flattened pyramids, after they have been combined to form a plane, showing a number of projected longitude lines. Notice that these projected lines are straightened and vary in length, depending on where they fall on the pyramids.
Once the space has been projected onto the plane, the plane is decomposed into the four-level grid hierarchy. Different levels can use different grid densities. The following illustration shows the plane after it has been decomposed into a 4x4 level-1 grid. For the purposes of illustration, the lower-levels of the grid hierarchy are omitted. In actuality, the plane is fully decomposed into a four-level grid hierarchy. After the decomposition process finishes, the geographic data is read, row by row, from the geography column, and the tessellation process is performed for each object in turn.
A spatial index can be created only on a spatial column. You can create spatial indexes on any spatial column in a table that supports spatial indexes, and you can create multiple spatial indexes on a given spatial column. For more information about the restrictions on spatial indexes, see Restrictions on Spatial Indexes.