Bitmaps in Microsoft SQL Server 2000
Summary: Microsoft SQL Server has been using bitmaps internally to speed up query execution since version 7.0. With the introduction of new operators in SQL Server 2000, further bitmap filtering techniques can now be applied for even faster query results from large data sets. (4 printed pages)
This article first explains the use of bitmaps in query optimization in Microsoft® SQL Server 7.0, then their enhanced application in SQL Server 2000™.
SQL Server 7.0
Microsoft SQL Server 7.0 uses bitmaps silently in all hash joins. A hash join has build and probe phases. In the build phase, all join keys of one of the joined tables (also called the outer table) are hashed into a hash table. As a byproduct of this hashing, SQL Server produces a separate bitmap where "0" represents "no key value in the outer table is hashed to this bit", and "1" means "one or more key values in the outer table hashes into this bit".
The size of the bitmap is determined during the query optimization based on the number of unique values expected in the outer table. Once all rows of the outer table are hashed, the bitmap has 0s and 1s in it. Then each key of the probe table (also called inner table) is hashed using the same hash algorithm as that used for the outer keys.
Prior to examining and searching the hash table from the build phase we examine the bitmap. If there is "0" in the corresponding entry, the row cannot have a match in the outer table and is, therefore, discarded.
Since probing the bitmap is substantially cheaper than searching the hash table, rows from the inner table that do not produce a join record may be processed substantially faster than without the bitmap. The bitmaps are created automatically and are not visible in the showplan output since they are an integral part of hash joins.
SQL Server 2000
Microsoft SQL Server 2000 employs similar bitmaps very efficiently — not only inside hash joins, but also outside join operators to eliminate rows with key values that cannot produce any join records. There is a "Bitmap Create" operator in the showplan output where the bitmaps are built. These bitmaps are introduced automatically in the query plans during query optimization. The following is an example of a query that uses a plan with such bitmaps:
SELECT S_NAME, S_ADDRESS ,S_PHONE ,S_COMMENT ,PS_PARTKEY FROM SUPPLIER ,PARTSUPP WHERE S_SUPPKEY = PS_SUPPKEY AND PS_PARTKEY between 5000 AND 5999
This query selects all suppliers from the SUPPLIER table that produce any parts in the 5000 series (partkeys between 5000 and 5999). Besides the SUPPLIER table, we are also using the PARTSUPP (parts supplier) table that contains, for each part, as many records as there are different suppliers producing the same part. Figure 1 below shows the graphical showplan generated by the SQL Server 2000.
The bitmap is created prior to the hash join on the outer input side of the join for each stream of data. Reading the above graphical showplan from right to left and from top to bottom, the scan of the PARTSUPP table is parallel. The following exchange operator (Parallelism/Repartition Streams) then distributes the rows using the key values so that they will be in corresponding streams with the redistributed rows of the SUPPLIER table prior to the parallel Hash Match (Join). The top branch is executed first, and until the hash table of the hash join is populated there is no activity on the bottom branch.
At the time the SUPPLIER table is scanned we already have the bitmaps built on the top branch using the PARTSUPP keys (PS_SUPPKEY column in this query). There is one bitmap for each stream entering the hash join. When the SUPPLIER rows enter the exchange operator right after the scan, we first decide which stream they should enter. We then discard the row if the bitmap on this stream has "0" in the entry that corresponds to the key value (S_SUPPKEY column). Therefore, the non-qualifying rows are eliminated even before they are placed in the proper output stream of the exchange.
SQL Server 2000 uses these bitmaps only in parallel query plans. This is because without the exchange operator there is no additional saving on top of the bitmaps inside the hash joins. In addition to the above scenario with hash joins, SQL Server 2000 will also use such bitmaps for merge joins, but again only in parallel plans, and where a SORT operator exists on the outer branch. The SORT operator causes SQL Server to process all outer rows prior to processing rows from the inner table and thereby allows us to build the bitmap. Without a SORT operator on the outer branch, the rows of both the outer and inner tables of the merge join are processed simultaneously, thereby making bitmap use unfeasible.
Test results show speed improvements
In general, the performance improvement caused by the bitmap deployment depends on the number of rows that are filtered out. This number may vary; therefore the speed increase may range from immeasurable to very significant, depending on the cost of other operators employed in the query execution.
Figure 2 below illustrates the speed improvements we observed testing three complex queries against large database (134 GB tables, 45 GB indexes). The tests were conducted in our lab using an 8-way, 550 MHz machine with 4 GB of RAM.
- Query A is a join of three tables (the largest one has about 100 GB of data) with an aggregation and ordering of the result.
- Query B is a query with a correlated subquery.
- Query C is a join of six tables with an aggregation on top of the join.
The use of bitmaps in query optimization is one of many techniques employed by SQL Server 2000 to provide the fastest results from large data sets, like those found in enterprise databases. By reducing the number of rows that need to be processed, inner and outer join queries are more efficient, data is returned rapidly and server processing is minimized.