Introduction to New T-SQL Programmability Features in SQL Server 2008

SQL Server Technical Article

 

Writer: Itzik Ben-Gan, SolidQ.com/InsideTSQL.com

Technical Reviewers: Michael Wang, Microsoft Corp.

Umachandar Jayachandran (UC), Microsoft Corp.

Jim Hogg, Microsoft Corp.

Michael Rys, Microsoft Corp.

 

**Published:**July 2008

Applies to: SQL Server 2008

 

Summary: This paper introduces key new Transact-SQL programmability features in Microsoft SQL Server 2008 as well as SQL/Common Language Runtime (CLR) enhancements. New Transact-SQL features provide improved performance, increased functionality, and enhanced globalization support. Transact-SQL programmability enhancements in SQL Server 2008 address the needs of both OLTP and data warehouse environments.

Acknowledgments: I would like to thank the following members of Solid Quality Mentors who provided feedback about this white paper: Kathy Blomstrom, Ken Spencer, Greg Low, Allan Hirt, Aaron Johal, James Winters, Gustavo Larriera, Andrew J. Kelly, Gianluca Hotz, Davide Mauri, Francisco A González Díaz, Adam Machanic, Jesús López, Randy Dyess, and Eladio Rincón.

 

Introduction

Microsoft SQL Server 2008 introduces several important new Transact‑SQL programmability features and enhances some existing ones. This paper covers the new features and enhancements and demonstrates them through code samples. It will familiarize you with the new features by providing a high-level functionality overview but does not cover all technical details. You can find details in SQL Server Books Online.

This paper covers the following key new features

·         Declaring and initializing variables

·         Compound assignment operators

·         Table value constructor support through the VALUES clause

·         Enhancements to the CONVERT function

·         New date and time data types and functions

·         Large UDTs

·         The HIERARCHYID data type

·         Table types and table-valued parameters

·         The MERGE statement, grouping sets enhancements

·         DDL trigger enhancements

·         Sparse columns

·         Filtered indexes

·         Large CLR user-defined aggregates

·         Multi-input CLR user-defined aggregates

·         The ORDER option for CLR table-valued functions

·         Object dependencies

·         Change data capture

·         Collation alignment with Microsoft® Windows®

·         Deprecation

Transact-SQL Delighters

Transact-SQL delighters are small enhancements that, for the most part, help make the programming experience more convenient. This section covers the following Transact-SQL delighters: declare and initialize variables, compound assignment operators, table value constructor support through the VALUES clause, and the enhanced CONVERT function.

Declare and Initialize Variables

Microsoft SQL Server® 2008 enables you to initialize variables inline as part of the variable declaration statement instead of using separate DECLARE and SET statements. This enhancement helps you abbreviate your code. The following code example demonstrates inline initializations using a literal and a function:

DECLARE @i AS INT = 0, @d AS DATETIME = CURRENT_TIMESTAMP;

SELECT @i AS [@i], @d AS [@d];

Compound Assignment Operators

Compound assignment operators help abbreviate code that assigns a value to a column or a variable. The new operators are:

·         += (plus equals)

·         -=  (minus equals)

·         *=  (multiplication equals)

·         /=  (division equals)

·         %=  (modulo equals)

You can use these operators wherever assignment is normally allowed—for example, in the SET clause of an UPDATE statement or in a SET statement that assigns values to variables. The following code example demonstrates the use of the += operator:

DECLARE @price AS MONEY = 10.00;

SET @price += 2.00;

SELECT @price;

This code sets the variable @price to its current value, 10.00, plus 2.00, resulting in 12.00.

Table Value Constructor Support through the VALUES Clause

SQL Server 2008 introduces support for table value constructors through the VALUES clause. You can now use a single VALUES clause to construct a set of rows. One use of this feature is to insert multiple rows based on values in a single INSERT statement, as follows:

USE tempdb;

IF OBJECT_ID('dbo.Customers', 'U') IS NOT NULL DROP TABLE dbo.Customers;

CREATE TABLE dbo.Customers

(

  custid      INT         NOT NULL,

  companyname VARCHAR(25) NOT NULL,

  phone       VARCHAR(20) NOT NULL,

  address     VARCHAR(50) NOT NULL,

  CONSTRAINT PK_Customers PRIMARY KEY(custid)

);

INSERT INTO dbo.Customers(custid, companyname, phone, address)

  VALUES

  (1, 'cust 1', '(111) 111-1111', 'address 1'),

  (2, 'cust 2', '(222) 222-2222', 'address 2'),

  (3, 'cust 3', '(333) 333-3333', 'address 3'),

  (4, 'cust 4', '(444) 444-4444', 'address 4'),

  (5, 'cust 5', '(555) 555-5555', 'address 5');

Note that even though no explicit transaction is defined here, this INSERT statement is considered an atomic operation. So if any row fails to enter the table, the entire INSERT operation fails.

A table value constructor can be used to define table expressions such as key derived tables and CTEs, and can be used where table expressions are allowed (such as in the FROM clause of a SELECT statement or as the source table in a MERGE statement). The following example demonstrates using the VALUES clause to define a derived table in the context of an outer SELECT statement:

SELECT *

FROM

  (VALUES

     (1, 'cust 1', '(111) 111-1111', 'address 1'),

     (2, 'cust 2', '(222) 222-2222', 'address 2'),

     (3, 'cust 3', '(333) 333-3333', 'address 3'),

     (4, 'cust 4', '(444) 444-4444', 'address 4'),

     (5, 'cust 5', '(555) 555-5555', 'address 5')

  ) AS C(custid, companyname, phone, address);

The outer query can operate on this table expression like any other table expression, including joins, filtering, grouping, and so on.

Enhanced CONVERT Function

The CONVERT function is enhanced in SQL Server 2008 to support new conversion options between character strings and binary data types. You determine the conversion option to use by specifying a style number as the third argument. Style 0 is the default behavior that was supported in previous SQL Server versions—this style translates character strings to the binary representation of the ASCII codes for the characters and vice versa.

Styles 1 and 2 introduce new functionality. These styles convert a hex string literal to a binary value that contains the same digits when presented in hex form and vice versa. If you use Style 1 when converting from a character string to a binary value, the input character string should include the 0x prefix; if you use Style 2, it should not. Similarly, if you use Style 1 when converting from a binary value to a character string, the output character string will contain the 0x prefix; if you use Style 2, it will not. The following example demonstrates using styles 1 and 2:

 

SELECT

  CONVERT(VARCHAR(12) , 0x49747A696B  , 1) AS [Bin to Char 1],

  CONVERT(VARBINARY(5), '0x49747A696B', 1) AS [Char to Bin 1],

  CONVERT(VARCHAR(12) , 0x49747A696B  , 2) AS [Bin to Char 2],

  CONVERT(VARBINARY(5), '49747A696B'  , 2) AS [Char to Bin 2];

 

This code produces the following output:

 

Bin to Char 1 Char to Bin 1 Bin to Char 2 Char to Bin 2

------------- ------------- ------------- -------------

0x49747A696B  0x49747A696B  49747A696B    0x49747A696B

Date and Time Data Types

Before SQL Server 2008, date and time improvements were probably at the top of the list of the most requested improvements for SQL Server—especially the request for separate date and time data types, but also for general enhanced support for temporal data. SQL Server 2008 introduces four new date and time data types—including DATE, TIME, DATETIME2, and DATETIMEOFFSET—as well as new functions that operate on the new types and enhancements to existing functions.

New Data Types

The four new date and time data types provide a split between date and time, support for a larger date range, improved accuracy, and support for a time zone element. The new data types have the compliant semantics and are compatible with the SQL standard. The DATE and TIME data types split the date and time, which in previous versions were consolidated. The DATETIME2 data type is an improved version of DATETIME, providing support for a larger date range and better accuracy. The DATETIMEOFFSET data type is similar to DATETIME2 with the addition of a time zone component. Table 1 describes the new data types, showing their storage in bytes, date-range support, accuracy, recommended entry format for literals, and an example.

 

Data Type

Storage (bytes)

Date Range

Accuracy

Recommended Entry Format and Example

DATE

3

January 1, 0001, through December 31, 9999 (Gregorian calendar)

1 day

'YYYY-MM-DD'
'2009-02-12'

TIME

3 to 5

 

100 nanoseconds

'hh:mm:ss.nnnnnnn'
'12:30:15.1234567'

DATETIME2

6 to 8

January 1, 0001, through December 31, 9999

100 nanoseconds

'YYYY-MM-DD hh:mm:ss.nnnnnnn'
'2009-02-12 12:30:15.1234567'

DATETIMEOFFSET

8 to 10

January 1, 0001, through December 31, 9999

100 nanoseconds

'YYYY-MM-DD hh:mm:ss.nnnnnnn [+|-]hh:mm'
'2009-02-12 12:30:15.1234567 +02:00'

 

Table 1: New Date and Time Data Types

Note that the format 'YYYY-MM-DD' is language neutral for the new data types, but it is language dependent for the DATETIME and SMALLDATETIME data types. The language-neutral format for those data types is 'YYYYMMDD'.

The three new types that contain a time component (TIME, DATETIME2, and DATETIMEOFFSET) enable you to specify the fractional seconds precision in parentheses following the type name. The default is 7, meaning 100 nanoseconds. If you need a fractional second accuracy of milliseconds, such as three for example, you must explicitly specify it: DATETIME2(3).

Metadata information in SQL Server 2008 reports meaningful precision and scale values for the new date and time data types. In the sys.columns view, the precision attribute describes the total number of characters in the default literal string representation of the value, and the scale describes the number of digits in the fractional part of the seconds. In INFORMATION_SCHEMA.COLUMNS, the DATETIME_PRECISION attribute describes the number of digits in the fractional part of the seconds.

The following code shows an example of using the new types:

 

DECLARE

  @d   AS DATE           = '2009-02-12',

  @t   AS TIME           = '12:30:15.1234567',

  @dt2 AS DATETIME2      = '2009-02-12 12:30:15.1234567',

  @dto AS DATETIMEOFFSET = '2009-02-12 12:30:15.1234567 +02:00';

 

SELECT @d AS [@d], @t AS [@t], @dt2 AS [@dt2], @dto AS [@dto];

 

The new data types are fully supported by the SQL Server Native Client OLE DB and ODBC providers as well as by ADO.NET in Microsoft Visual Studio® 2008. Table 2 shows the mappings between the new SQL Server 2008 data types and the corresponding client provider types.

 

SQL

ODBC

OLE DB

ADO.NET

DATE

SQL_TYPE_DATE/
SQL_DATE

DBTYPE_DBDATE

DateTime

TIME

SQL_TIME/
SQL_SS_TIME2

DBTYPE_DBTIME/
DBTYPE_DBTIME2

TimeSpan

DATETIMEOFFSET

SQL_SS_TIMESTAMPOFFSET

DBTYPE_DBTIMESTAMPOFFSET

DateTimeOffset

DATETIME2

SQL_TYPE_TIMESTAMP

SQL_TIMESTAMP

DBTYPE_DBTIMESTAMP

DateTime

 

Table 2: Date and Time Data Type Mappings to Client Providers

New and Enhanced Functions

To support the new date and time data types, SQL Server 2008 introduces new functions and enhances existing functions. The new functions are SYSDATETIME, SYSUTCDATETIME, SYSDATETIMEOFFSET, SWITCHOFFSET, and TODATETIMEOFFSET.

SYSDATETIME, SYSUTCDATETIME, and SYSDATETIMEOFFSET return the current system date and time value. SYSDATETIME returns the current date and time as a DATETIME2 value, SYSUTCDATETIME returns the current date and time in UTC as a DATETIME2 value, and SYSDATETIMEOFFSET returns the current date and time along with the system time zone as a DATETIMEOFFSET value. To get only the current date or only the current time, you can cast the value returned from the SYSUTCDATETIME function to DATE or TIME, as the following example shows:

 

SELECT

  CAST(SYSDATETIME() AS DATE) AS [current_date],

  CAST(SYSDATETIME() AS TIME) AS [current_time];

 

Interestingly, when converting a column whose data type contains both a date and a time component to DATE, the SQL Server query optimizer can still rely on index ordering to process a query more efficiently. This is contrary to the usual behavior, where conversion of a column to a different type prevents the optimizer from relying on index order. The new behavior means that the optimizer might consider an index seek for a query filter that has a conversion to DATE. For example, the plan for the following query performs an index seek on the index on the CurrencyRateDate DATETIME column:

 

USE AdventureWorks;

SELECT FromCurrencyCode, ToCurrencyCode, EndOfDayRate

FROM Sales.CurrencyRate

WHERE CAST(CurrencyRateDate AS DATE) = '20040701';

 

The SWITCHOFFSET function adjusts an input DATETIMEOFFSET value to a specified time zone, while preserving the UTC value. The syntax is SWITCHOFFSET(datetimeoffset_value, time_zone). For example, the following code adjusts the current system datetimeoffset value to time zone GMT +05:00:

SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(), '-05:00');

So if the current system datetimeoffset value is February 12, 2009 10:00:00.0000000 -08:00, this code returns the value February 12, 2009 13:00:00.0000000 -05:00.

The TODATETIMEOFFSET function sets the time zone offset of an input date and time value. Its syntax is TODATETIMEOFFSET(date_and_time_value, time_zone).

This function is different from SWITCHOFFSET in several ways. First, it is not restricted to a datetimeoffset value as input; rather it accepts any date and time data type. Second, it does not try to adjust the time based on the time zone difference between the source value and the specified time zone but instead simply returns the input date and time value with the specified time zone as a datetimeoffset value.

The main purpose of the TODATETIMEOFFSET function is to convert types that are not time zone aware to DATETIMEOFFSET by the given time zone offset. If the given date and time value is a DATETIMEOFFSET, the TODATETIMEOFFSET function changes the DATETIMEOFFSET value based on the same original local date and time value plus the new given time zone offset.

For example, the current system datetimeoffset value is February 12, 2009 10:00:00.0000000 -08:00, and you run the following code:

SELECT TODATETIMEOFFSET(SYSDATETIMEOFFSET(), '-05:00');

The value February 12, 2009 10:00:00.0000000 -05:00 is returned. Remember that the SWITCHOFFSET function returned February 12, 2009 13:00:00.0000000 -05:00 because it adjusted the time based on the time zone differences between the input (-08:00) and the specified time zone (-05:00).

As mentioned earlier, you can use the TODATETIMEOFFSET function with any date and time data type as input. For example, the following code takes the current system date and time value and returns it as a datetimeoffset value with a time zone -00:05:

SELECT TODATETIMEOFFSET(SYSDATETIME(), '-05:00');

Functions that accept a date and time part as input now support new parts. The DATEADD, DATEDIFF, DATENAME, and DATEPART functions add support for microseconds and nanoseconds. DATENAME and DATEPART also support the TZoffset part (time zone), with DATEPART also supporting the ISO_WEEK part (ISO 8601 week number).

Many other functions are enhanced in SQL Server 2008 to support the new date and time types, among them the existing date and time functions, type conversion functions (CAST and CONVERT), set and aggregate functions (such as MAX, MIN), metadata functions (such as TYPEPROPERTY, COLUMNPROPERTY), and system functions (such as DATALENGTH, IS_DATE).

Large UDTs

In SQL Server 2005, user-defined types (UDTs) in the CLR were limited to 8,000 bytes. SQL Server 2008 lifts this limitation and now supports large UDTs. Similar to the built-in large object types that SQL Server supports, large UDTs can now reach up to 2 GB in size. If the UDT value does not exceed 8,000 bytes, the database system treats it as an inline value as in SQL Server 2005. If it exceeds 8,000 bytes, the system treats it as a large object and report its size as “unlimited.”

One application where large UDTs are required is to support spatial data. For the most part, 8,000 bytes might be enough to represent a value of a spatial data type, but in some cases, the size requirement of such values might exceed 8,000 bytes. SQL Server 2008 introduces the built-in UDTs GEOMETRY and GEOGRAPHY, which are implemented as large CLR UDTs. These new types introduce support for spatial data. The GEOGRAPHY type supports round earth topology, and the GEOMETRY type supports flat earth topology. Spatial data is covered in detail in a separate paper.

A UDT’s size is defined by the attribute SqlUserDefinedTypeAttribute.MaxByteSize as part of the type’s definition. If this attribute is set to -1, the serialized UDT can reach the same size as other large object types (currently 2 GB); otherwise, the UDT cannot exceed the size specified in the MaxByteSized property.

Client APIs such as OLE DB and ODBC in the SQL Server Native Client and ADO.NET were enhanced to support large UDTs. Down-level clients (SQL Server 2005 and 2000 clients) convert a large UDT to VARBINARY(MAX) and IMAGE, respectively.

HIERARCHYID Data Type

The new HIERARCHYID data type in SQL Server 2008 is a system-supplied CLR UDT that can be useful for storing and manipulating hierarchies. This type is internally stored as a VARBINARY value that represents the position of the current node in the hierarchy (both in terms of parent-child position and position among siblings). You can perform manipulations on the type by using either Transact-SQL or client APIs to invoke methods exposed by the type. Let’s look at indexing strategies for the HIERARCHYID type, how to use the type to insert new nodes into a hierarchy, and how to query hierarchies.

Indexing Strategies

The following code demonstrates how to use the HIERARCHYID type, creating a table named Employees (in the tempdb database for test purposes) with a column called hid that holds the HIERARCHYID value:

 

USE tempdb;

IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL DROP TABLE dbo.Employees;

CREATE TABLE dbo.Employees

(

  empid   INT NOT NULL,

  hid     HIERARCHYID NOT NULL,

  lvl AS hid.GetLevel() PERSISTED,

  empname VARCHAR(25) NOT NULL,

  salary  MONEY       NOT NULL,

  CONSTRAINT PK_Emploees PRIMARY KEY NONCLUSTERED(empid)

);

 

Notice how the code uses the GetLevel method of the hid column to define the persisted computed column level. This method returns the level of the current node in the hierarchy.

The HIERARCHYID value provides topological sorting, meaning that a child’s sort value is guaranteed to be higher than the parent’s sort value. This definition is transitive; that is, a node’s sort value is guaranteed to be higher than all of its ancestors. So if you create an index on the HIERARCHYID column, the index sorts the data in a depth-first manner—all members of the same subtree are close to each other in the leaf of the index. Such an index can efficiently satisfy requests such as returning all descendents of a node.

You can use another indexing strategy called breadth-first, in which you organize all nodes from the same level close to each other in the leaf of the index by having the column representing the level in the hierarchy first in the key list. Requests that can benefit from such an index include getting all nodes from the same level, getting direct subordinates of a node, and so on. The following code creates both types of indexes on the Employees table:

CREATE UNIQUE CLUSTERED INDEX idx_depth_first ON dbo.Employees(hid);

CREATE UNIQUE INDEX idx_breadth_first ON dbo.Employees(lvl, hid);

Inserting New Nodes

To insert a node into the hierarchy, you must first produce a new HIERARCHYID value that represents the correct position in the hierarchy. Use the HIERARCHYID::GetRoot() method to produce the value for the root node. You use the GetDescendant method to produce a value below a given parent. The GetDescendant method accepts two optional HIERARCHYID input values representing the two nodes between which you want to position the new node.

Note that the GetDescendant method does not guarantee that HIERARCHYID values are unique. To enforce uniqueness, you must define a primary key, a unique constraint, or a unique index on the column.

For example, the following code creates the usp_AddEmp stored procedure, which adds a new node to the hierarchy:

 

IF OBJECT_ID('dbo.usp_AddEmp', 'P') IS NOT NULL DROP PROC dbo.usp_AddEmp;

GO

CREATE PROC dbo.usp_AddEmp

  @empid   AS INT,

  @mgrid   AS INT = NULL,

  @empname AS VARCHAR(25),

  @salary  AS MONEY

AS

 

DECLARE

  @hid            AS HIERARCHYID,

  @mgr\_hid        AS HIERARCHYID,

  @last\_child\_hid AS HIERARCHYID;

 

IF @mgrid IS NULL

  SET @hid = HIERARCHYID::GetRoot();

ELSE

BEGIN

  SET @mgr\_hid = (SELECT hid FROM dbo.Employees WHERE empid = @mgrid);

  SET @last\_child\_hid =

    (SELECT MAX(hid) FROM dbo.Employees

     WHERE hid.GetAncestor(1) = @mgr\_hid);

  SET @hid = @mgr\_hid.GetDescendant(@last\_child\_hid, NULL);

END

 

INSERT INTO dbo.Employees(empid, hid, empname, salary)

  VALUES(@empid, @hid, @empname, @salary);

GO

 

If the input employee is the first node in the hierarchy, the procedure uses the HIERARCHYID::GetRoot() method to assign the hid value for the root. Otherwise, the procedure queries the last child hid value of the new employee’s manager; it then invokes the GetDescendant method to produce a value that positions the new node after the last child of that manager. Run the following code to populate the table with a few employees:

 

EXEC dbo.usp_AddEmp @empid =  1, @mgrid = NULL, @empname = 'David'  , @salary = $10000.00;

EXEC dbo.usp_AddEmp @empid =  2, @mgrid =    1, @empname = 'Eitan'  , @salary = $7000.00;

EXEC dbo.usp_AddEmp @empid =  3, @mgrid =    1, @empname = 'Ina'    , @salary = $7500.00;

EXEC dbo.usp_AddEmp @empid =  4, @mgrid =    2, @empname = 'Seraph' , @salary = $5000.00;

EXEC dbo.usp_AddEmp @empid =  5, @mgrid =    2, @empname = 'Jiru'   , @salary = $5500.00;

EXEC dbo.usp_AddEmp @empid =  6, @mgrid =    2, @empname = 'Steve'  , @salary = $4500.00;

EXEC dbo.usp_AddEmp @empid =  7, @mgrid =    3, @empname = 'Aaron'  , @salary = $5000.00;

EXEC dbo.usp_AddEmp @empid =  8, @mgrid =    5, @empname = 'Lilach' , @salary = $3500.00;

EXEC dbo.usp_AddEmp @empid =  9, @mgrid =    7, @empname = 'Rita'   , @salary = $3000.00;

EXEC dbo.usp_AddEmp @empid = 10, @mgrid =    5, @empname = 'Sean'   , @salary = $3000.00;

EXEC dbo.usp_AddEmp @empid = 11, @mgrid =    7, @empname = 'Gabriel', @salary = $3000.00;

EXEC dbo.usp_AddEmp @empid = 12, @mgrid =    9, @empname = 'Emilia' , @salary = $2000.00;

EXEC dbo.usp_AddEmp @empid = 13, @mgrid =    9, @empname = 'Michael', @salary = $2000.00;

EXEC dbo.usp_AddEmp @empid = 14, @mgrid =    9, @empname = 'Didi'   , @salary = $1500.00;

Querying the Hierarchy

If you query the hid value, you get its binary representation, which is not very meaningful. You can use the ToString method to get a more logical string representation of the value, which shows the path with a slash sign used as a separator between the levels. For example, run the following query to get both the binary and logical representations of the hid value:

 

SELECT hid, hid.ToString() AS path, lvl, empid, empname, salary

FROM dbo.Employees

ORDER BY hid;

 

Recall that HIERARCHYID values provide topological sorting and that the GetLevel method produces the level in the hierarchy. Using these, you can easily produce a graphical depiction of the hierarchy—simply sort the rows by hid, and produce indentation based on the lvl column as follows:

 

SELECT

  REPLICATE(' | ', lvl) + empname AS emp,

  hid.ToString() AS path

FROM dbo.Employees

ORDER BY hid;

 

To get all subordinates of an employee (subtree), you can use a method called IsDescendantOf. This method accepts a node’s HIERARCHYID value as input and returns 1 if the queried node is a descendant of the input node. For example, the following query returns all subordinates—direct and indirect—of employee 3:

 

SELECT C.empid, C.empname, C.lvl

FROM dbo.Employees AS P

  JOIN dbo.Employees AS C

    ON P.empid = 3

    AND C.hid.IsDescendantOf(P.hid) = 1;

You can also the IsDescendantOf method to return all managers of a given employee. For example, the following query returns all managers of employee 14:

 

SELECT P.empid, P.empname, P.lvl

FROM dbo.Employees AS P

  JOIN dbo.Employees AS C

    ON C.empid = 14

    AND C.hid.IsDescendantOf(P.hid) = 1;

 

To get a whole level of subordinates of a certain employee, use the GetAncestor method. This method accepts a number (call it n) as input and returns the HIERARCHYID value of the ancestor of the queried node, n levels above. For example, the following query returns direct subordinates (1 level below) of employee 9:

 

SELECT C.empid, C.empname

FROM dbo.Employees AS P

  JOIN dbo.Employees AS C

    ON P.empid = 9

    AND C.hid.GetAncestor(1) = P.hid;

Other Supported Methods

You can use several other methods to manipulate the HIERARCHYID data type, including Parse, GetReparentedValue, Read, and Write.

The HIERARCHYID::Parse method converts a canonical string representation of a hierarchical value to HIERARCHYID. This is the same as using CAST(<string_val> AS HIERARCHYID).

The GetReparentedValue method helps you reparent nodes. It accepts two arguments—@old_root and @new_root—and returns a value in which the @old\_root portion of the path is replaced with @new_root. For example, the node you query currently has the path /1/1/2/3/2/ (logical representation), @old_root is /1/1/, and @new_root is /2/1/4/. The GetReparentedValue method returns /2/1/4/2/3/2/. As mentioned earlier in regard to the GetDescendant method, the GetReparentedValue method also does not guarantee unique HIERARCHYID values. To enforce uniqueness, you must define a primary key, a unique constraint, or a unique index on the column.

The Read and Write methods, available only in CLR code, are used to read from a BinaryReader and write to a BinaryWriter. In Transact-SQL, you simply use the CAST function to convert a binary value to a HIERARCHYID value and vice versa. Similarly, you can use the CAST function to convert a logical string representation of the path to HIERARCHYID and vice versa.

Table Types and Table-Valued Parameters

SQL Server 2008 introduces table types and table-valued parameters that help abbreviate your code and improve its performance. Table types allow easy reuse of table definition by table variables, and table-valued parameters enable you to pass a parameter of a table type to stored procedures and functions.

Table Types

Table types enable you to save a table definition in the database and use it later to define table variables and parameters to stored procedures and functions. Because table types let you reuse a table definition, they ensure consistency and reduce chances for errors.

You use the CREATE TYPE statement to create a new table type. For example, the following code defines a table type called OrderIDs in the AdventureWorks database:

 

USE AdventureWorks;

GO

CREATE TYPE dbo.OrderIDs AS TABLE

( pos INT NOT NULL PRIMARY KEY,

  orderid INT NOT NULL UNIQUE );

 

When declaring a table variable, simply specify the table type name as the data type of the variable. For example, the following code defines a table variable called @T of the OrderIDs type, inserts three rows into the table variable, and then queries it:

 

DECLARE @T AS dbo.OrderIDs;

INSERT INTO @T(pos, orderid) VALUES(1, 51480),(2, 51973),(3, 51819);

SELECT pos, orderid FROM @T ORDER BY pos;

 

To get metadata information about table types in the database, query the view sys.table_types.

Table-Valued Parameters

You can now use table types as the types for input parameters of stored procedures and functions. Currently, table-valued parameters are read only, and you must define them as such by using the READONLY keyword.

A common scenario where table-valued parameters are very useful is passing an “array” of keys to a stored procedure. Before SQL Server 2008, common ways to meet this need were based on dynamic SQL, a split function, XML, and other techniques. The approach using dynamic SQL involved the risk of SQL Injection and did not provide efficient reuse of execution plans. Using a split function was complicated, and  using XML was complicated and nonrelational. (For details about this scenario, see "Arrays and Lists in SQL Server" by SQL Server MVP Erland Sommarskog at http://www.sommarskog.se/arrays-in-sql.html.)

In SQL Server 2008, you simply pass the stored procedure a table-valued parameter. There is no risk of SQL Injection, and there is opportunity for efficient reuse of execution plans. For example, the following procedure accepts a table-valued parameter of the OrderIDs type with a set of order IDs and returns all orders from the SalesOrderHeader table whose order IDs appear in the input table-valued parameter, sorted by the pos column:

 

CREATE PROC dbo.usp_getorders(@T AS dbo.OrderIDs READONLY)

AS

SELECT O.SalesOrderID, O.OrderDate, O.CustomerID, O.TotalDue

FROM Sales.SalesOrderHeader AS O

  JOIN @T AS T

    ON O.SalesOrderID = T.orderid

ORDER BY T.pos;

GO

 

The following code invokes the stored procedure:

 

DECLARE @MyOrderIDs AS dbo.OrderIDs;

INSERT INTO @MyOrderIDs(pos, orderid)

  VALUES(1, 51480),(2, 51973),(3, 51819);

EXEC dbo.usp_getorders @T = @MyOrderIDs;

 

Note that when you do not provide a parameter value, a table-valued parameter defaults to an empty table. This is important to stress because this case might be confused with an actual empty table passed to the procedure. Also note that you cannot set variables and parameters of a table type to NULL.

SQL Server 2008 also enhances client APIs to support defining and populating table-valued parameters. Table-valued parameters are treated internally like table variables. Their scope is the batch (procedure, function). They have several advantages in some cases over temporary tables and other alternative methods:

·         They are strongly typed.

·         SQL Server does not maintain distribution statistics (histograms) for them; therefore, they do not cause recompilations.

·         They are not affected by a transaction rollback.

·         They provide a simple programming model.

MERGE Statement

The new MERGE statement is a standard statement that combines INSERT, UPDATE, and DELETE actions as a single atomic operation based on conditional logic. Besides being performed as an atomic operation, the MERGE statement is more efficient than applying those actions individually.

The statement refers to two tables: a target table specified in the MERGE INTO clause and a source table specified in the USING clause. The target table is the target for the modification, and the source table data can be used to modify the target.

The semantics (as well as optimization) of a MERGE statement are similar to those of an outer join. You specify a predicate in the ON clause that defines which rows in the source have matches in the target, which rows do not, and which rows in the target do not have a match in the source. You have a clause for each case that defines which action to take—WHEN MATCHED THEN, WHEN NOT MATCHED [BY TARGET] THEN, and WHEN NOT MATCHED BY SOURCE THEN. Note that you do not have to specify all three clauses, but only the ones you need.

As with other modification statements, the MERGE statement also supports the OUTPUT clause, which enables you to return attributes from the modified rows. As part of the OUTPUT clause, you can invoke the $action function, which returns the action that modified the row ('INSERT', 'UPDATE', 'DELETE').

To demonstrate the MERGE statement, the following code creates the tables Customers and CustomersStage in tempdb for test purposes and populates them with sample data:

 

USE tempdb;

IF OBJECT_ID('dbo.Customers', 'U') IS NOT NULL DROP TABLE dbo.Customers;

CREATE TABLE dbo.Customers

(

  custid      INT         NOT NULL,

  companyname VARCHAR(25) NOT NULL,

  phone       VARCHAR(20) NOT NULL,

  address     VARCHAR(50) NOT NULL,

  CONSTRAINT PK_Customers PRIMARY KEY(custid)

);

INSERT INTO dbo.Customers(custid, companyname, phone, address)

  VALUES

  (1, 'cust 1', '(111) 111-1111', 'address 1'),

  (2, 'cust 2', '(222) 222-2222', 'address 2'),

  (3, 'cust 3', '(333) 333-3333', 'address 3'),

  (4, 'cust 4', '(444) 444-4444', 'address 4'),

  (5, 'cust 5', '(555) 555-5555', 'address 5');

 

IF OBJECT_ID('dbo.CustomersStage', 'U') IS NOT NULL
  DROP TABLE dbo.CustomersStage;

CREATE TABLE dbo.CustomersStage

(

  custid      INT         NOT NULL,

  companyname VARCHAR(25) NOT NULL,

  phone       VARCHAR(20) NOT NULL,

  address     VARCHAR(50) NOT NULL,

  CONSTRAINT PK_CustomersStage PRIMARY KEY(custid)

);

INSERT INTO dbo.CustomersStage(custid, companyname, phone, address)

  VALUES

  (2, 'AAAAA', '(222) 222-2222', 'address 2'),

  (3, 'cust 3', '(333) 333-3333', 'address 3'),

  (5, 'BBBBB', 'CCCCC', 'DDDDD'),

  (6, 'cust 6 (new)', '(666) 666-6666', 'address 6'),

  (7, 'cust 7 (new)', '(777) 777-7777', 'address 7');

 

The following MERGE statement defines the Customers table as the target for the modification and the CustomersState table as the source. The MERGE condition matches the custid attribute in the source with the custid attribute in the target. When a match is found in the target, the target customer’s attributes are overwritten with the source customer attributes. When a match is not found in the target, a new row is inserted into the target, using the source customer attributes. When a match is not found in the source, the target customer row is deleted:

 

MERGE INTO dbo.Customers AS TGT

USING dbo.CustomersStage AS SRC

  ON TGT.custid = SRC.custid

WHEN MATCHED THEN

  UPDATE SET

    TGT.companyname = SRC.companyname,

    TGT.phone = SRC.phone,

    TGT.address = SRC.address

WHEN NOT MATCHED THEN

  INSERT (custid, companyname, phone, address)

  VALUES (SRC.custid, SRC.companyname, SRC.phone, SRC.address)

WHEN NOT MATCHED BY SOURCE THEN

  DELETE

OUTPUT

  $action, deleted.custid AS del_custid, inserted.custid AS ins_custid;

 

This MERGE statement updates three rows (customers 2, 3, and 5), inserts two rows (customers 6 and 7), and deletes two rows (customers 1 and 4).

The MERGE statement lets you specify an additional predicate in all WHEN clauses (add an AND operator followed by the predicate). For the action to take place, besides the original ON predicate, the additional predicate must also hold true.

Grouping Sets

SQL Server 2008 introduces several extensions to the GROUP BY clause that enable you to define multiple groupings in the same query. These extensions are: the GROUPING SETS, CUBE, and ROLLUP subclauses of the GROUP BY clause and the GROUPING_ID function. The new extensions are standard and should not be confused with the older, nonstandard CUBE and ROLLUP options.

GROUPING SETS, CUBE, and ROLLUP Subclauses

To demonstrate the new extensions, let’s query the Orders table that the following code creates and populates:

 

USE tempdb;

IF OBJECT_ID('dbo.Orders', 'U') IS NOT NULL DROP TABLE dbo.Orders;

CREATE TABLE dbo.Orders

(

  orderid   INT        NOT NULL,

  orderdate DATETIME   NOT NULL,

  empid     INT        NOT NULL,

  custid    VARCHAR(5) NOT NULL,

  qty       INT        NOT NULL,

  CONSTRAINT PK_Orders PRIMARY KEY(orderid)

);

INSERT INTO dbo.Orders (orderid, orderdate, empid, custid, qty)

VALUES

  (30001, '20060802', 3, 'A', 10), (10001, '20061224', 1, 'A', 12),

  (10005, '20061224', 1, 'B', 20), (40001, '20070109', 4, 'A', 40),

  (10006, '20070118', 1, 'C', 14), (20001, '20070212', 2, 'B', 12),

  (40005, '20080212', 4, 'A', 10), (20002, '20080216', 2, 'C', 20),

  (30003, '20080418', 3, 'B', 15), (30004, '20060418', 3, 'C', 22),

  (30007, '20060907', 3, 'D', 30);

 

Without the extensions, a single query normally defines one “grouping set” (a set of attributes to group by) in the GROUP BY clause. If you want to calculate aggregates for multiple grouping sets, you usually need multiple queries. If you want to unify the result sets of multiple GROUP BY queries, each with a different grouping set, you must use the UNION ALL set operation between the queries. 

You might need to calculate and store aggregates for various grouping sets in a table. By preprocessing and materializing the aggregates, you can support applications that require fast response time for aggregate requests. However, the aforementioned approach, in which you have a separate query for each grouping set, is very inefficient. This approach requires a separate scan of the data for each grouping set and expensive calculation of aggregates.

With the new GROUPING SETS subclause, you simply list all grouping sets that you need. Logically, you get the same result set as you would by unifying the result sets of multiple queries. However, with the GROUPING SETS subclause, which requires much less code, SQL Server optimizes data access and the calculation of aggregates. SQL Server will not necessarily need to scan data once for each grouping set; plus, in some cases it calculates higher-level aggregates based on lower-level aggregates instead of re-aggregating base data.

For example, the following query uses the new GROUPING SETS subclause to return aggregates for four grouping sets:

 

SELECT custid, empid, YEAR(orderdate) AS orderyear, SUM(qty) AS qty

FROM dbo.Orders

GROUP BY GROUPING SETS (

  ( custid, empid, YEAR(orderdate) ),

  ( custid, YEAR(orderdate)        ),

  ( empid, YEAR(orderdate)         ),

  () );

 

The grouping sets defined in this query are (custid, empid, YEAR(orderdate)), (custid, YEAR(orderdate)), (empid, YEAR(orderdate)), and (). The last is an empty grouping set representing ALL. It is similar to an aggregate query with no GROUP BY clause, in which you treat the whole table as a single group.

The two new CUBE and ROLLUP subclauses should be considered as abbreviations to the GROUPING SETS subclause. The CUBE subclause produces the power set of the set of elements listed in its parentheses. In other words, it produces all possible grouping sets that can be formed out of the elements listed in parentheses, including the empty grouping set. For example, the following use of CUBE:

CUBE( a, b, c )

is logically equivalent to:

GROUPING SETS((a),(b),(c),(a, b),(a, c),(b, c),(a, b, c),())

For n elements. CUBE produces 2^n grouping sets.

Out of the elements listed in its parentheses, the ROLLUP subclause produces only the grouping sets that have business value, assuming a hierarchy between the elements. For example, the following use of ROLLUP:

ROLLUP( country, region, city )

is logically equivalent to:

GROUPING SETS((country, region, city),(country, region),(country),())

Notice that cases that have no business value, assuming a hierarchy between the elements—such as (city)—were not produced. There might be multiple cities with the same name in the world, and even within the same country, so there is no business value in aggregating them.

Grouping Sets Algebra

You are not restricted to only one subclause in the GROUP BY clause; you can specify multiple subclauses separated by commas. The comma serves as a product operator, meaning that you get a Cartesian product of the grouping sets represented by each subclause. For example, the following code represents a Cartesian product between two GROUPING SETS subclauses:

GROUPING SETS ( (a, b), (c, d) ), GROUPING SETS ( (w, x), (y, z) )

This code is logically equivalent to:

GROUPING SETS ( (a, b, w, x), (a, b, y, z), (c, d, w, x), (c, d, y, z) )

Bearing in mind that the CUBE and ROLLUP options are simply abbreviations of the GROUPING SETS subclause, you can also use CUBE and ROLLUP subclauses as part of a Cartesian product.

GROUPING_ID Function

The GROUPING_ID function lets you identify the grouping set that each result row belongs to. As input, you provide all attributes that participate in any grouping set. The function produces an integer result that is a bitmap, in which each bit represents a different attribute. This way the function produces a unique integer for each grouping set.

The following query shows an example of using the GROUPING_ID function to identify the grouping set:

 

SELECT

  GROUPING_ID(

    custid, empid,

    YEAR(orderdate), MONTH(orderdate), DAY(orderdate) ) AS grp_id,

  custid, empid,

  YEAR(orderdate) AS orderyear,

  MONTH(orderdate) AS ordermonth,

  DAY(orderdate) AS orderday,

  SUM(qty) AS qty

FROM dbo.Orders

GROUP BY

  CUBE(custid, empid),

  ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate));

 

Notice that in the output, shown here in abbreviated form, each grouping set is represented by a unique integer:

 

grp_id  custid empid       orderyear   ordermonth  orderday    qty

------- ------ ----------- ----------- ----------- ----------- -----------

0       C      3           2006        4           18          22

16      NULL   3           2006        4           18          22

24      NULL   NULL        2006        4           18          22

25      NULL   NULL        2006        4           NULL        22

0       A      3           2006        8           2           10

16      NULL   3           2006        8           2           10

24      NULL   NULL        2006        8           2           10

25      NULL   NULL        2006        8           NULL        10

0       D      3           2006        9           7           30

16      NULL   3           2006        9           7           30

...

For example, the integer 25 represents the grouping set (orderyear, ordermonth). The bits representing the elements that are part of the grouping set are turned off (ordermonth – 2 and orderyear – 4), and the bits representing the elements that are not part of the grouping set are turned on (orderday – 1, empid – 8, and custid – 16). The integer 25 is achieved by adding the values represented by the bits that are turned on: 1 + 8 + 16 = 25. This feature is especially useful when you need to materialize the aggregates and then query only specific grouping sets. You can cluster the table by the grp_id attribute, which would allow SQL Server to efficiently satisfy a request for a specific grouping set.

DDL Trigger Enhancements

In SQL Server 2008, the type of events on which you can now create DDL triggers is enhanced to include stored procedures that perform DDL-like operations. This gives you more complete coverage of DDL events that you can capture with triggers. The XML schema for events is installed as part of the database engine installation in the directory C:\Program Files\Microsoft SQL Server\100\Tools\Binn\schemas\sqlserver\2006\11\events\events.xsd and can also be found at https://schemas.microsoft.com/sqlserver.

Many stored procedures perform DDL-like operations. Before SQL Server 2008, you could not capture their invocation with a trigger. Now you can capture many new events that fire as a result of calls to such procedures. You can find the full list of trappable events in SQL Server Books Online. For example, the stored procedure sp_rename now fires a trigger created on the new RENAME event. To demonstrate this type of trigger, the following code creates a database called testdb and, within it, a trigger on the RENAME event that prints the source and target entity details for test purposes:

 

USE master;

GO

IF DB_ID('testdb') IS NOT NULL DROP DATABASE testdb;

CREATE DATABASE testdb;

GO

USE testdb;

GO

CREATE TRIGGER trg_testdb_rename ON DATABASE FOR RENAME

AS

 

DECLARE

  @SchemaName       AS SYSNAME =

    EVENTDATA().value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname'),

  @TargetObjectName AS SYSNAME =

    EVENTDATA().value('(/EVENT_INSTANCE/TargetObjectName)[1]', 'sysname'),

  @ObjectName       AS SYSNAME =

    EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname'),

  @NewObjectName    AS SYSNAME =

    EVENTDATA().value('(/EVENT_INSTANCE/NewObjectName)[1]', 'sysname');

 

DECLARE

  @msg AS NVARCHAR(1000) =

N'RENAME event occurred.

SchemaName: ' + @SchemaName + N'

TargetObjectName: ' + @TargetObjectName + N'

ObjectName: ' + @ObjectName + N'

NewObjectName: ' + @NewObjectName;

 

PRINT @msg;

GO

 

To test the trigger, the following code creates a table called dbo.T1 with a column called col1 and runs the sp_rename procedure to rename the column to col2:

 

CREATE TABLE dbo.T1(col1 INT);

EXEC sp_rename 'dbo.T1.col1', 'col2', 'COLUMN';

 

The trigger on the RENAME event fires and prints the following message:

 

RENAME event occurred.

SchemaName: dbo

TargetObjectName: T1

ObjectName: col1

NewObjectName: col2

Sparse Columns

Sparse columns are columns that are optimized for the storage of NULLs. To define a column as sparse, specify the SPARSE attribute as part of the column definition. Sparse columns consume no storage for NULLs, even with fixed-length types; however, when a column is marked as sparse, storage of non-NULL values becomes more expensive than usual. Therefore, you should define a column as sparse only when it will store a large percentage of NULLs. SQL Server Books Online provides recommendations for the percentage of NULLs that justify making a column sparse for each data type.

Querying and manipulation of sparse columns is the same as for regular columns, with one exception described later in this paper. For example, the following code creates a table named T1 in tempdb (for test purposes), marks three of its columns with the SPARSE attribute, inserts a couple of rows, and queries the table:

 

USE tempdb;

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

CREATE TABLE dbo.T1

(

  keycol INT                   NOT NULL PRIMARY KEY,

  col1   VARCHAR(20)           NOT NULL,

  col2   INT            SPARSE     NULL,

  col3   CHAR(10)       SPARSE     NULL,

  col4   NUMERIC(12, 2) SPARSE     NULL

);

INSERT INTO dbo.T1(keycol, col1, col2) VALUES(1, 'a', 10);

INSERT INTO dbo.T1(keycol, col1, col4) VALUES(2, 'b', 20.00);

 

SELECT keycol, col1, col2, col3, col4

FROM dbo.T1;

 

There are several restrictions on using sparse columns not covered in this paper; see SQL Server Books Online for complete information.

SQL Server 2008 lets you define a column set that combines all sparse columns of a table into a single XML column. You might want to consider this option when you have a large number of sparse columns in a table (more than 1,024) and operating on them individually might be cumbersome.

To define a column set, specify the following as part of the CREATE TABLE statement:

 

<column_set_name> XML column_set FOR ALL_SPARSE_COLUMNS

 

For example, the following code recreates the table T1 with a column set named cs:

 

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

CREATE TABLE dbo.T1

(

  keycol INT                   NOT NULL PRIMARY KEY,

  col1   VARCHAR(20)           NOT NULL,

  col2   INT            SPARSE     NULL,

  col3   CHAR(10)       SPARSE     NULL,

  col4   NUMERIC(12, 2) SPARSE     NULL,

  cs     XML column_set FOR ALL_SPARSE_COLUMNS

);

 

You can use the same code that was used earlier to insert rows into the table, and then query it:

 

INSERT INTO dbo.T1(keycol, col1, col2) VALUES(1, 'a', 10);

INSERT INTO dbo.T1(keycol, col1, col4) VALUES(2, 'b', 20.00);

 

SELECT keycol, col1, col2, col3, col4

FROM dbo.T1;

 

But you can also operate on the column set by using XML operations instead of relational operations. For example, the following code inserts a row into the table by using the column set:

 

INSERT dbo.T1(keycol, col1, cs)

  VALUES(3, 'c', '<col3>CCCCCCCCCC</col3><col4>30.00</col4>');

 

NULL is assumed for a column that is omitted from the XML value, such as col2 in this case.

Note that if you have a column set defined in the table, SELECT * does not return the same result as a SELECT statement with an explicit list of all columns. SELECT * returns all sparse columns as a single XML value in the column set. To demonstrate this, run the following code:

 

SELECT * FROM dbo.T1;

 

This code returns the following output:

 

keycol      col1       cs

----------- ---------- ------------------------------------------

1           a          <col2>10</col2>

2           b          <col4>20.00</col4>

3           c          <col3>CCCCCCCCCC</col3><col4>30.00</col4>

 

If you explicitly list the columns in the SELECT clause, all result columns are returned as relational ones.

Another new feature that can be used in conjunction with sparse columns is filtered indexes. This feature is explained in the next section.

Filtered Indexes and Statistics

SQL Server 2008 introduces filtered indexes and statistics. You can now create a nonclustered index based on a predicate, and only the subset of rows for which the predicate holds true are stored in the index B-Tree. Similarly, you can manually create statistics based on a predicate. The optimizer has the logic to figure out when such filtered indexes and statistics are useful.

Well-designed filtered indexes can improve query performance and plan quality because they are smaller than nonfiltered indexes. Also, filtered statistics—whether created automatically for a filtered index or manually—are more accurate than nonfiltered statistics because they need to cover only a subset of rows from the table.

You can also reduce index maintenance cost by using filtered indexes because there is less data to maintain. This includes modifications against the index, index rebuilds, and the cost of updating statistics. Filtered indexes also obviously reduce storage costs.

Let’s look at a few examples that demonstrate filtered indexes. The following code creates an index on the CurrencyRateID column in the Sales.SalesOrderHeader table, with a filter that excludes NULLs:

 

USE AdventureWorks;

GO

CREATE NONCLUSTERED INDEX idx_currate_notnull

  ON Sales.SalesOrderHeader(CurrencyRateID)

  WHERE CurrencyRateID IS NOT NULL;

 

Considering query filters, besides the IS NULL predicate that explicitly looks for NULLs, all other predicates exclude NULLs, so the optimizer knows that there is the potential to use the index. For example, the plan for the following query shows that the index is used:

 

SELECT *

FROM Sales.SalesOrderHeader

WHERE CurrencyRateID = 4;

 

The CurrencyRateID column has a large percentage of NULLs; therefore, this index consumes substantially less storage than a nonfiltered one on the same column. You can also create similar indexes on sparse columns.

The following code creates a nonclustered index on the Freight column, filtering rows where the Freight is greater than or equal to 5000.00:

 

CREATE NONCLUSTERED INDEX idx_freight_5000_or_more

  ON Sales.SalesOrderHeader(Freight)

  WHERE Freight >= 5000.00;

 

The optimizer considers using an index when a subinterval of the index filtered interval is requested in the query filter:

 

SELECT *

FROM Sales.SalesOrderHeader

WHERE Freight BETWEEN 5500.00 AND 6000.00;

 

Filtered indexes can also be defined as UNIQUE and have an INCLUDE clause as with regular nonclustered indexes.

SQL/CLR Enhancements

Common language runtime (CLR) support is enhanced in several ways in SQL Server 2008. Enhancements to CLR UDTs were described earlier. This section describes enhancements to user-defined aggregates (UDAs) and table-valued functions (TVFs).

Enhancements to User-Defined Aggregates

The new support for large UDTs was covered earlier in this paper. Similarly, SQL Server 2008 introduces support for large UDAs—the maximum size in bytes of a persisted value can now exceed 8,000 bytes and reach up to 2 GB. To allow a UDA to exceed 8,000 bytes, specify -1 in the MaxByteSize attribute; otherwise, you must specify a value that is smaller than or equal to 8,000, in which case the size in bytes cannot exceed the specified value.

UDAs are enhanced in another way as well—they now support multiple inputs. An example of a UDA that can benefit from both enhancements is one that performs string concatenation. The UDA can accept two input parameters: the column holding the string to concatenate and the character to use as a separator. The UDA can return an output larger than 8,000 bytes.

Enhancements to Table-Valued Functions

CLR TVFs now support a new ORDER clause as part of the CREATE FUNCTION DDL statement. You can use this clause to specify column names in the output table when you know that rows will always be returned in that order. This can help the optimizer when you query the table function and rely on those columns for ordering purposes (such as when they are used in ORDER BY, GROUP BY, DISTINCT, and INSERT when the (?) target has an (?) index).

For example, the following C#® code defines a function called fn_split that accepts a separated list of values and a separator, and then splits the input string to the individual elements. The function returns a table result with two columns—pos and element—with a row for each element. The pos column represents the position of the element in the list, and the element column returns the element itself:

 

using System;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

using System.Collections;

using System.Collections.Generic;

 

public partial class TVF

{

    // Struct used in string split functions

    struct row_item

    {

        public string item;

        public int pos;

    }

 

    // Split array of strings and return a table

    // FillRowMethodName = "ArrSplitFillRow"

    [SqlFunction(FillRowMethodName = "ArrSplitFillRow",

     DataAccess = DataAccessKind.None,

     TableDefinition = "pos INT, element NVARCHAR(MAX)")]

    public static IEnumerable fn_split(SqlString inpStr,

        SqlString charSeparator)

    {

        string locStr;

        string[] splitStr;

        char[] locSeparator = new char[1];

        locSeparator[0] = (char)charSeparator.Value[0];

        if (inpStr.IsNull)

            locStr = "";

        else

            locStr = inpStr.Value;

        splitStr = locStr.Split(locSeparator,

              StringSplitOptions.RemoveEmptyEntries);

        //locStr.Split(charSeparator.ToString()[0]);

        List<row_item> SplitString = new List<row_item>();

        int i = 1;

        foreach (string s in splitStr)

        {

            row_item r = new row_item();

            r.item = s;

            r.pos = i;

            SplitString.Add(r);

            ++i;

        }

        return SplitString;

    }

 

    public static void ArrSplitFillRow(

      Object obj, out int pos, out string item)

    {

        pos = ((row_item)obj).pos;

        item = ((row_item)obj).item;

    }

}

 

The function always returns the rows in pos order; however, you cannot rely on this order when querying the function unless you specify an ORDER BY clause in the outer query.

Assuming that the path for the assembly’s .dll file is C:\TVF\TVF\bin\Debug\TVF.dll, the following code creates the assembly in tempdb for test purposes:

 

USE tempdb;

CREATE ASSEMBLY TVF FROM 'C:\TVF\TVF\bin\Debug\TVF.dll';

 

The following code registers two functions based on the CLR fn_split function: fn_split_no_order does not have the ORDER clause, and fn_split_order_by_pos specifies the ORDER clause with pos as the ordering column:

 

CREATE FUNCTION dbo.fn_split_no_order

  (@string AS NVARCHAR(MAX), @separator AS NCHAR(1))

RETURNS TABLE(pos INT, element NVARCHAR(4000))

EXTERNAL NAME TVF.TVF.fn_split;

GO

CREATE FUNCTION dbo.fn_split_order_by_pos

  (@string AS NVARCHAR(MAX), @separator AS NCHAR(1))

RETURNS TABLE(pos INT, element NVARCHAR(4000))

ORDER (pos)

EXTERNAL NAME TVF.TVF.fn_split;

GO

 

Now consider the following queries:

 

SELECT *

FROM dbo.fn_split_no_order(

  N'a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',

  N',') AS T

ORDER BY pos;

 

SELECT *

FROM dbo.fn_split_order_by_pos(

  N'a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z',

  N',') AS T

ORDER BY pos;

 

The first queries the table function that was registered without the ORDER clause, and the second queries the table function that was registered with ORDER(pos). Both request the data sorted by pos. If you examine the execution plans for both queries, you see that the plan for the first query involves sorting, while the plan for the second query does not. Also, the cost estimate for the first plan is about 10 times higher than the cost estimate for the second plan.

Object Dependencies

SQL Server 2008 delivers several objects that provide reliable discovery of object dependencies, replacing the unreliable older sys.sql_dependencies view and the sp_depends stored procedure. The new objects provide information about dependencies that appear in static code, including both schema-bound and non-schema-bound objects as well as cross-database and even cross-server dependencies. The new objects do not cover dependencies that appear in dynamic SQL code or in CLR code.

To see how to query the new object dependency information, first run the following code to create several objects in the tempdb database for test purposes:

 

USE tempdb;

IF OBJECT_ID('dbo.Proc1', 'P') IS NOT NULL DROP PROC dbo.Proc1;

IF OBJECT_ID('dbo.V1', 'V') IS NOT NULL DROP VIEW dbo.V1;

IF OBJECT_ID('dbo.V2', 'V') IS NOT NULL DROP VIEW dbo.V2;

IF OBJECT_ID('dbo.T1', 'U') IS NOT NULL DROP TABLE dbo.T1;

IF OBJECT_ID('dbo.T2', 'U') IS NOT NULL DROP TABLE dbo.T2;

GO

 

CREATE PROC dbo.Proc1

AS

SELECT * FROM dbo.T1;

EXEC('SELECT * FROM dbo.T2');

GO

CREATE TABLE dbo.T1(col1 INT);

CREATE TABLE dbo.T2(col2 INT);

GO

CREATE VIEW dbo.V1

AS

SELECT col1 FROM dbo.T1;

GO

CREATE VIEW dbo.V2

AS

SELECT col1 FROM dbo.T1;

GO

 

SQL Server 2008 introduces three new objects that provide object dependency information: the sys.sql_expression_dependencies catalog view, and the sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities dynamic management functions (DMFs).

The sys.sql_expression_dependencies view provides object dependencies by name. It replaces the older sys.sql_dependencies view. The following query against sys.sql_expression_dependencies shows all dependencies in the current database:

 

SELECT

  OBJECT_SCHEMA_NAME(referencing_id) AS srcschema,

  OBJECT_NAME(referencing_id) AS srcname,

  referencing_minor_id AS srcminorid,

  referenced_schema_name AS tgtschema,

  referenced_entity_name AS tgtname,

  referenced_minor_id AS tgtminorid

FROM sys.sql_expression_dependencies;

 

This query produces the following output:

 

srcschema srcname srcminorid tgtschema tgtname tgtminorid

--------- ------- ---------- --------- ------- ----------

dbo       Proc1   0          dbo       T1      0

dbo       V1      0          dbo       T1      0

dbo       V2      0          dbo       T1      0

 

Notice that the query identified all dependencies in static code but not dependencies in dynamic code (the reference to dbo.T2 from dbo.Proc1).

The sys.dm_sql_referenced_entities DMF provides all entities that the input entity references—in other words, all entities that the input entity depends on. For example, the following code returns all entities that dbo.Proc1 depends on:

 

SELECT

  referenced_schema_name AS objschema,

  referenced_entity_name AS objname,

  referenced_minor_name  AS minorname,

  referenced_class_desc  AS class

FROM sys.dm_sql_referenced_entities('dbo.Proc1', 'OBJECT');

 

This code returns the following output:

 

objschema objname   minorname class

--------- --------- --------- ----------------

dbo       T1        NULL      OBJECT_OR_COLUMN

dbo       T1        col1      OBJECT_OR_COLUMN

 

The output shows that dbo.Proc1 depends on the table dbo.T1 and the column dbo.T1.col1. Again, dependencies that appear in dynamic code are not identified.

The sys.dm_sql_referencing_entities DMF provides all entities that reference the input entity—in other words, all entities that depend on the input entity. For example, the following code returns all entities that depend on dbo.T1:

 

SELECT

  referencing_schema_name AS objschema,

  referencing_entity_name AS objname,

  referencing_class_desc  AS class

FROM sys.dm_sql_referencing_entities('dbo.T1', 'OBJECT');

 

This code returns the following output:

 

objschema objname   class

--------- --------- ----------------

dbo       Proc1     OBJECT_OR_COLUMN

dbo       V1        OBJECT_OR_COLUMN

dbo       V2        OBJECT_OR_COLUMN

 

The output shows that dbo.Proc1, dbo.V1, and dbo.V2 depend on dbo.T1.

Change Data Capture

Change data capture is a new mechanism in SQL Server 2008 that enables you to easily track data changes in a table. The changes are read by a capture process from the transaction log and recorded in change tables. Those change tables mirror the columns of the source table and also contain metadata information that can be used to deduce the changes that took place. Those changes can be consumed in a convenient relational format through TVFs.

An extract, transform, and load (ETL) process in SQL Server Integration Services that applies incremental updates to a data warehouse is just one example of an application that can benefit from change data capture.

Here, I’ll demonstrate a simple process of capturing changes against an Employees table in a database called testdb.

Before you can enable tables for change data capture, you must first enable the database for change data capture by using the stored procedure sys.sp_cdc_enable_db. This stored procedure creates in the database several system objects related to change data capture, including the cdc schema, the cdc user, tables, jobs, stored procedures, and functions. To check whether the database is enabled for change data capture, query the is_cdc_enabled column in sys.databases. The following code creates a database called testdb and enables change data capture in the database:

 

USE master;

IF DB_ID('testdb') IS NOT NULL DROP DATABASE testdb;

CREATE DATABASE testdb;

GO

USE testdb;

EXECUTE sys.sp_cdc_enable_db;

 

To disable change data capture on the database, use the stored procedure sys.sp_cdc_disable_db.

The capture process starts reading changes from the log and recording them in the change tables as soon as the first table in the database is enabled for change data capture. To enable change data capture for a table, you use the sys.sp_cdc_enable_table stored procedure. Note that SQL Server Agent must be running for the capture processes to work. The following code creates a table NAMED Employees in the testdb database and inserts one employee row into the table:

 

CREATE TABLE dbo.Employees

(

  empid  INT         NOT NULL,

  name   VARCHAR(30) NOT NULL,

  salary MONEY       NOT NULL

);

INSERT INTO dbo.Employees(empid, name, salary) VALUES(1, 'Emp1', 1000.00);

 

The following code enables the table for change data capture:

 

EXECUTE sys.sp_cdc_enable_table

    @source\_schema = N'dbo'

  , @source\_name = N'Employees'

  , @role\_name = N'cdc_Admin';

 

Because this is the first table in the database that is enabled for change data capture, this code also causes the capture process to start (two jobs will start: cdc.testdb_capture and cdc.testdb_cleanup). The @role_name argument enables you to assign a database role that will be granted access to the change data. If the specified role does not exist, SQL Server creates it. By default, changes are captured for all table columns. To capture changes only in a subset of columns, you can specify the list of columns in the @captured_column_list argument. To get metadata information about the columns included in the capture instance, you can use the sys.sp_cdc_get_captured_columns procedure.

If you later want to disable change data capture for a table, use the sys.sp_cdc_disable_table stored procedure. And to check whether a table is enabled for change data capture, query the is_tracked_by_cdc column in the sys.tables view.

To get information about the change data capture configuration for each enabled table, use the stored procedure sys.sp_cdc_help_change_data_capture as follows:

 

EXECUTE sys.sp_cdc_help_change_data_capture;

 

The changes are not consumed directly from the change tables but rather through table-valued functions. You can easily identify and consume only the delta of changes that was not yet consumed. Data is requested for changes that lie within a specified range of log serial numbers (LSNs). SQL Server 2008 also provides the mapping functions sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time that help you convert a date-time range to a range of LSNs and vice versa.

Separate functions provide all changes within an interval (cdc.fn_cdc_get_all_changes_<capture_instance>) and net changes that took place against distinct rows (cdc.fn_cdc_get_net_changes_<capture_instance>), if that option was enabled by the sys.sp_cdc_enable_table procedure.

SQL Server 2008 also has a stored procedure named sys.sp_cdc_get_ddl_history that gives you the DDL change history associated with a specified capture instance.

Collation Alignment with Windows

SQL Server 2008 aligns support for collations with Microsoft Windows Server® 2008, Windows Vista®, Windows Server 2003, and Windows® XP Home Edition. The new SQL Server version adds new collations and revises existing collation versions. You can recognize the new and revised collations by finding the number 100 (the internal version number of SQL Server 2008) in their names. You can use the following query to get the list of new and revised collations:

 

SELECT name, description

FROM sys.fn_helpcollations() AS C

WHERE name LIKE '%100%';

 

This query returns more than 1,400 rows representing different variations of 84 new collations.

The new collations introduce significant changes. Many of them address East Asian languages that support supplementary characters. Chinese_Taiwan_Stroke_100 and Chinese_Taiwan_Bopomofo_100 now assign culture-correct weight for each character, specifically the Ext. A + B characters.

The new collations also provide string comparisons between supplementary characters based on linguistic sorting. A binary flag is added for true code point comparisons: binary-code point.

Note that some collations supported in SQL Server 2005 are deprecated in SQL Server 2008, including the Korean, Hindi, Macedonian, and Lithuanian_Classic Windows collations and the SQL_ALTDiction_CP1253_CS_AS SQL collation. These collations are still supported in SQL Server 2008 for backward compatibility, but they do not show up in the Setup collation list and are not retuned when querying the fn_helpcollations function.

Deprecated Features

SQL Server 2008 handles deprecation policy more seriously than previous versions. SQL Server 2008 discontinues support for the following features (partial list):

·         60, 65, and 70 compatibility levels—at a minimum, the database must be at compatibility level 80 (2000)

·         DUMP and  LOAD commands—use the BACKUP and RESTORE commands instead

·         BACKUP LOG WITH TRUNCATE_ONLY, BACKUP_LOG WITH NO_LOG, and BACKUP TRANSACTION statements

·         The sp_addalias stored procedure—replace aliases with user accounts and database roles

·         sp_addgroup, sp_changegroup, sp_dropgroup, and sp_helpgroup stored procedures—use roles instead

·         The Surface Area Configuration Tool

·         Window calculations (using the OVER clause) are not allowed in the recursive member of recursive common table expressions (CTEs)

SQL Server provides performance counters and trace events to keep track of deprecated feature usage. The SQLServer:Deprecated Features object provides the Usage counter and an instance for each deprecated feature. As for trace events, the Deprecation Final Support event class occurs when you use a feature that will be removed from the next version of SQL Server. The Deprecation Announcement event class occurs when you use a feature that will be removed from a future version of SQL Server.

For the full list of discontinued features as well as a list of features planned for deprecation in the next and future versions of SQL Server, see SQL Server Books Online.

Conclusion

This paper introduces Transact-SQL enhancements and SQL/CLR improvements in SQL Server 2008. It covers Transact-SQL delighters, data type enhancements, Data Manipulation Language (DML) enhancements, Data Definition Language (DDL) enhancements, SQL/common language runtime (CLR) enhancements, object dependencies, change data capture, collation alignment with Windows, and deprecated features.

The new and enhanced features in SQL Server 2008 help you improve the performance of your database, enhance database support beyond the relational realm, and gain more functionality—all of which make you better equipped to support your enterprise data platform needs.

About the Author

Itzik Ben-Gan is a Mentor and Co-Founder of Solid Quality Mentors. A SQL Server Microsoft Most Valuable Professional (MVP) since 1999, Itzik has delivered numerous training events around the world focused on Transact-SQL querying, query tuning, and programming. Itzik is the author of Inside Microsoft SQL Server 2005: T-SQL Querying (MSPress) and Inside Microsoft SQL Server 2005: T-SQL Programming (MSPress) and is a co-author of Advanced Transact-SQL for SQL Server 2000 (APress). He has written many articles for SQL Server Magazine as well as articles and whitepapers for MSDN. Itzik's speaking activities include Tech Ed, DevWeek, PASS, and SQL Server Magazine Connections as well as  various user groups and other engagements around the world.

 

For more information:

Microsoft SQL Server Web site

Microsoft SQL Server TechCenter

Microsoft SQL Server DevCenter

 

Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:

·         Are you rating it high due to having good examples, excellent screenshots, clear writing, or another reason?

·         Are you rating it low due to poor examples, fuzzy screenshots, unclear writing?

This feedback will help us improve the quality of white papers we release. Send feedback.