SQL Server 2008
New Data Types
At a Glance:
- New date and time data types
- Representing position in a hierarchy
- Two models for working with spatial data
Conducting business in this global economy increasingly requires companies to use new types of data, applications, and complex calculations. The seven new data types built into SQL Server 2008
provide the means for working with and simplifying the management of more complicated data.
Date and Time
With the old datetime data type, SQL Server® users did not have the ability to work with date and time information separately. Four of the new data types—date, time, datetime2, and datetimeoffset—change that, simplifying working with date and time data and providing for increased date range, fractional seconds precision, and time zone support. New database applications should use these new data types instead of the legacy datetime. Let's take a closer look at the new versions.
The date data type stores a date without a time component. The range is from January 1, 1000 though December 31, 9999 (0001-01-01 through 9999-12-31). Each date variable requires 3 bytes of storage and has a precision of 10 digits. The accuracy of the date type is limited to a single day.
Take a look at Figure 1, which shows how to create and initialize Date variables in T-SQL scripts. The variable @myDate1 is initialized by a string formatted as 'MM/DD/YYYY.' The variable @myDate2 is not initialized; it will have a value of NULL. The variable @myDate3 is initialized to the date on the local computer system. The value of variables can be changed at any time with the SELECT or SET statements, as the example illustrates by changing the value of @myDate2. Date columns can also be created in tables. Figure 2 shows how to create a table with three date columns.
Figure 1 Create and initialize date variables in T-SQL scripts
DECLARE @myDate1 date = '01/22/2005' DECLARE @myDate2 date DECLARE @myDate3 date = GetDate() SELECT @myDate2 = '2007-05-08 12:35:29.1234567 +12:15' SELECT @myDate1 AS '@myDate1', @myDate2 AS '@myDate2', @myDate3 AS '@myDate3' --Results --@myDate1 @myDate2 @myDate3 ------------ ---------- ---------- --2005-01-22 2007-05-08 2007-11-20
Figure 2 Create a table with three date columns
USE TempDB GO CREATE TABLE myTable ( myDate1 date,myDate2 date,myDate3 date ) GO INSERT INTO myTable VALUES('01/22/2005', '2007-05-08 12:35:29.1234567 +12:15', GetDate()) SELECT * FROM myTable --Results --myDate1 myDate2 myDate3 ------------ ---------- ---------- --2005-01-22 2007-05-08 2007-11-20
The time data type stores the time of day without any date component. It is based on a 24-hour clock, so the supported range is 00:00:00.0000000 through 23:59:59.9999999 (hours, minutes, seconds, and fractional seconds). You can specify the fractional second's precision when the data type is created. The default precision is 7 digits; the accuracy is 100ns. Precision affects how much storage space is required, which can range from 3 bytes for up to 2 digits; 4 bytes for 3 or 4 digits; to 5 bytes for 5 to 7 digits.
The T-SQL script in Figure 3 shows how variable precision is affected by implicit conversions of a string initialization value. The T-SQL code first creates and initializes eight separate time variables to an identical value. The fractional precision of each variable is equivalent to its name. For example, @myTime3 has a fractional precision of three places. The results show that the precision of each time data type is equivalent to the fractional precision with which it is declared. Digits that fall outside the range are truncated.
Figure 3 Display time data type's variable precision
DECLARE @myTime time = '01:01:01.1234567 +01:01' DECLARE @myTime1 time(1) = '01:01:01.1234567 +01:01' DECLARE @myTime2 time(2) = '01:01:01.1234567 +01:01' DECLARE @myTime3 time(3) = '01:01:01.1234567 +01:01' DECLARE @myTime4 time(4) = '01:01:01.1234567 +01:01' DECLARE @myTime5 time(5) = '01:01:01.1234567 +01:01' DECLARE @myTime6 time(6) = '01:01:01.1234567 +01:01' DECLARE @myTime7 time(7) = '01:01:01.1234567 +01:01' SELECT @myTime AS '@myTime', @myTime1 AS '@myTime1', @myTime2 AS '@myTime2', @myTime3 AS '@myTime3', @myTime4 AS '@myTime4', @myTime5 AS '@myTime5', @myTime6 AS '@myTime6', @myTime7 AS '@myTime7' --Results --@myTime @myTime1 @myTime2 @myTime3 @myTime4 ------------------ ---------- ----------- ------------ ------------- --01:01:01.1234567 01:01:01.1 01:01:01.12 01:01:01.123 01:01:01.1235 -- --@myTime5 @myTime6 @myTime7 ---------------- --------------- ---------------- --01:01:01.12346 01:01:01.123457 01:01:01.1234567 DROP TABLE myTable
A time data type can be created as a column in a table. The DROP TABLE myTable T-SQL script in Figure 4 creates a table named myTable1 and adds three time columns to the table. A record is then inserted into the table and the contents of the table are displayed with a SELECT statement.
Figure 4 Create myTable1
USE TempDB GO CREATE TABLE myTable1 ( myTime1 time(1), myTime2 time(2), myTime3 time(3) ) GO INSERT INTO myTable1 VALUES('01:30:01.1234567', '02:34:01.1234567', '03:01:59.1234567') SELECT * from myTable1 --Results --myTime1 myTime2 myTime3 ------------ ----------- ------------ --01:30:01.1000000 02:34:01.1200000 03:01:59.1230000 DROP TABLE myTable1
Datetimeoffset and Datetime2
The datetimeoffset data type provides time-zone awareness. The time data type does not contain a time zone and so only works on local time. In the global economy, however, it is often necessary to know how a time in one area of the world relates to time in another area. The time zone offset is indicated as + or - hh:mm.
This code creates a datetimeoffset variable and initializes it to the time value 8:52 A.M. Pacific Standard Time:
DECLARE @date DATETIMEOFFSET = '2007-11-26T08:52:00.1234567-08:00' PRINT @date --Results --2007-11-26 08:52:00.1234567 -08:00
The string that initializes the datetimeoffset variable (@date in the script) is specially formatted, arranged from the most significant element to the least. A single capital letter T separates the date and time elements. A minus sign separates the time elements from the time zone. There are no spaces between the minus sign and the time or time zone elements. This format is one of two ISO 8601 formats supported by the datetimeoffset data type. (ISO 8601 is an international standard for representation of date and time values.)
The precision of the time component is specified just as for the time data type, and it defaults to the same seven digits if unspecified. The supported range is the same.
The datetime2 data type is an extension of the original datetime type. It supports a larger date range and greater fractional-seconds precision, and it allows you to specify the precision. The date range of the datetime2 type is January 1, 0001 through December 31, 9999, compared with the January 1, 1753 through December 31, 9999 range of the original datetime. As with the time type, seven fractional seconds of precision are provided. The original datetime type provided three digits of precision and a time range of 00:00:00 through 23:59:59.999. Here's how to create a datetime2 variable and initialize it to the local server date and time:
DECLARE @datetime2 DATETIME2 = GetDate(); PRINT @datetime2 --Results --2007-11-26 09:39:04.1370000
Next I'll look at the new hierarchyid data type. This data type works with the relationship among data elements in a table, rather than a specific date or time data.
The Hierarchyid Data Type
The hierarchyid data type allows you to construct relationships among data elements within a table, specifically to represent a position in a hierarchy. To explore this data type, let's start by creating the MyCompany database and filling it with employee data, using the script in Figure 5.
Figure 5 Create and populate the MyCompany database
USE MASTER GO CREATE DATABASE MyCompany GO USE MyCompany GO --Create a table called employee that will store --the data for the employees for MyCompany. CREATE TABLE employee ( EmployeeID int NOT NULL, EmpName varchar(20) NOT NULL, Title varchar(20) NULL, Salary decimal(18, 2) NOT NULL, hireDate datetimeoffset(0) NOT NULL, ) GO --These statements will insert the data for the employees of MyCompany. INSERT INTO employee VALUES(6, 'David', 'CEO', 35900.00, '2000-05-23T08:30:00-08:00') INSERT INTO employee VALUES(46, 'Sariya', 'Specialist', 14000.00, '2002-05-23T09:00:00-08:00') INSERT INTO employee VALUES(271, 'John', 'Specialist', 14000.00, '2002-05-23T09:00:00-08:00') INSERT INTO employee VALUES(119, 'Jill', 'Specialist', 14000.00, '2007-05-23T09:00:00-08:00') INSERT INTO employee VALUES(269, 'Wanida', 'Assistant', 8000.00, '2003-05-23T09:00:00-08:00') INSERT INTO employee VALUES(272, 'Mary', 'Assistant', 8000.00, '2004-05-23T09:00:00-08:00') GO --Results --EmployeeID EmpName Title Salary hireDate ------------- ------- ---------- -------- -------------------------- --6 David CEO 35900.00 2000-05-23 08:30:00 -08:00 --46 Sariya Specialist 14000.00 2002-05-23 09:00:00 -08:00 --271 John Specialist 14000.00 2002-05-23 09:00:00 -08:00 --119 Jill Specialist 14000.00 2007-05-23 09:00:00 -08:00 --269 Wanida Assistant 8000.00 2003-05-23 09:00:00 -08:00 --272 Mary Assistant 8000.00 2004-05-23 09:00:00 -08:00
Figure 6 shows the resulting simple database consisting of a single employee table. This employee table in the MyCompany database does not have any imposed structure. This is normal for a relational database, as structure is imposed dynamically by an application through its query and processing code.
Figure 6 MyCompany employee table
Business data, though, typically has a type of inherent structure. For example, every business has a reporting structure, such as that shown for MyCompany in Figure 7. All employees of MyCompany report to David, the CEO. Some employees report directly, as in the case of Jill. Others, like Mary, report through an intermediary. In programming terms, the reporting structure for MyCompany is referred to as a tree because that's what its shape resembles. David, at the top, reports to no one; he is the parent or ancestor. The employees that report to David are beneath. Such nodes are referred to as children or descendants. David can have as many descendants as needed to represent his direct reports.
Figure 7 MyCompany's organizational structure (Click the image for a larger view)
The script in Figure 8 rebuilds the MyCompany database using the hierarchyid data type, constructing a relationship that matches the reporting structure for MyCompany. The ALTER TABLE statement is used to first add a column of type hierarchyid. David's node is then inserted using hierarchyid's GetRoot method. David's directs are then added to the tree using the GetDescendant method.
Figure 8 Rebuild the database using hierarchyid
DELETE employee GO ALTER TABLE employee ADD OrgNode hierarchyid NOT NULL GO DECLARE @child hierarchyid, @Manager hierarchyid = hierarchyid::GetRoot() --The first step is to add the node at the top of the --tree. Since David is the CEO his node will be the --root node. INSERT INTO employee VALUES(6, 'David', 'CEO', 35900.00, '2000-05-23T08:30:00-08:00', @Manager) --The next step is to insert the records for --the employees that report directly to David. SELECT @child = @Manager.GetDescendant(NULL, NULL) INSERT INTO employee VALUES(46, 'Sariya', 'Specialist', 14000.00, '2002-05-23T09:00:00-08:00', @child) SELECT @child = @Manager.GetDescendant(@child, NULL) INSERT INTO employee VALUES(271, ‚John', ‚Specialist', 14000.00, '2002-05-23T09:00:00-08:00', @child) SELECT @child = @Manager.GetDescendant(@child, NULL) INSERT INTO employee VALUES(119, ‚Jill', ‚Specialist', 14000.00, ‚2007-05-23T09:00:00-08:00', @child) --We can now insert the employee that reports to --Sariya. SELECT @manager = OrgNode.GetDescendant(NULL, NULL) FROM employee WHERE EmployeeID = 46 INSERT INTO employee VALUES(269, ‚Wanida', ‚Assistant', 8000.00, ‚2003-05-23T09:00:00-08:00', @manager) --Next insert the employee that report to John. SELECT @manager = OrgNode.GetDescendant(NULL, NULL) FROM employee WHERE EmployeeID = 271 INSERT INTO employee VALUES(272, ‚Mary', ‚Assistant', 8000.00, ‚2004-05-23T09:00:00-08:00', @manager) GO
Once the database records are added and the hierarchy is constructed, the contents of the employee table can be displayed with a query like this:
SELECT EmpName, Title, Salary, OrgNode.ToString() AS OrgNode FROM employee ORDER BY OrgNode GO --Results --EmpName Title Salary OrgNode ---------- ---------- --------- ------- --David CEO 35900.00 / --Sariya Specialist 14000.00 /1/ --Wanida Assistant 8000.00 /1/1/ --John Specialist 14000.00 /2/ --Mary Assistant 8000.00 /2/1/ --Jill Specialist 14000.00 /3/
OrgNode is the hierarchyid column. Each slash / character in the result indicates a node in the hierarchy tree. David is at the root, which is shown with a single slash. Sariya, John, and Jill report to David and have two slash marks, indicating that they are the second node in the hierarchy. The numbers 1, 2, or 3 show the order of the respective child node. This system is very flexible. Child nodes can be removed, inserted, or added as needed. If we added an employee between John and Jill, for example, that employee would be listed in the resultset as: /2.1/.
To answer the question, for example, "Who reports to Sariya?" you can create a query as shown in the following T-SQL code:
DECLARE @Sariya hierarchyid SELECT @Sariya = OrgNode FROM employee WHERE EmployeeID = 46 SELECT EmpName, Title, Salary, OrgNode.ToString() AS 'OrgNode' FROM employee WHERE OrgNode.GetAncestor(1) = @Sariya GO --Results --EmpName Title Salary OrgNode --------- --------- ------- ------- --Wanida Assistant 8000.00 /1/1/
The query uses the hierarchyid's GetAncestor method, which returns the parent of the current hierarchyid node. In the previous code, the variable @Sariya is set to the hierarchy node for Sariya. This is because Sariya is the direct ancestor of any employee that reports to her. Therefore, writing a query that returns the employees that report directly to Sariya consists of retrieving Sariya's node from the tree and then selecting all employees whose ancestor node is Sariya's node.
Hierarchyid columns tend to be very compact because the number of bits required to represent a node in a tree depends on the average number of children for the node (commonly referred to as the node's fanout). For example, a new node in an organizational hierarchy of 100,000 employees, with an average fanout of six levels, would take about five bytes of storage.
The hierarchyid data type provides several methods that facilitate working with hierarchical data. A summary of these methods is shown in Figure 9
. Detailed information on all of the methods is available in SQL Server Books Online (
Figure 9 Methods provided by the hierarchyid data type
|GetAncestor||Returns a hierarchyid that represents the nth ancestor of this hierarchyid node.|
|GetDescendant||Returns a child node of this hierarchyid node.|
|GetLevel||Returns an integer that represents the depth of this hierarchyid node in the overall hierarchy.|
|GetRoot||Returns the root hierarchyid node of this hierarchy tree. Static.|
|IsDescendant||Returns true if the passed-in child node is a descendant of this hierarchyid node.|
|Parse||Converts a string representation of a hierarchy to a hierarchyid value. Static.|
|Reparent||Moves a node of a hierarchy to a new location within the hierarchy.|
|ToString||Returns a string that contains the logical representation of this hierarchyid.|
Spatial Data Types
Spatial data is data that identifies geographic locations and shapes, primarily on the Earth. These can be landmarks, roads, even the location of a business. SQL Server 2008 provides the geography and geometry data types for working with this type of data.
The geography data type works with round-earth information. The round-earth model factors the curved surface of the earth into account in its calculations. Position information is given in longitude and latitude. This model is well-suited to applications like transoceanic shipping, military planning, and short-range applications referenced to the earth's surface. This is the model to use if your data is stored in latitudes and longitudes.
The geometry data type works with the flat-earth or planar model. In this model, the earth is treated as a flat projection beginning at a known point. The flat-earth model does not take the curvature of the earth into account, so it is primarily used for describing short distances—for example in a database application that maps the interior of a building.
The geography and geometry types are constructed from vector objects, specified in Well-Known Text (WKT) or Well-Known Binary (WKB) format. These are transport formats for spatial data described by the Open Geospatial Consortium (OGC) Simple Features for SQL Specification. Figure 10 lists the seven types of vector objects supported by SQL Server 2008.
Figure 10 Vector objects supported by SQL Server 2008
|MultiPoint||A series of points.|
|LineString||A series of zero or more points connected by lines.|
|MultiLineString||A set of linestrings.|
|Polygon||A contiguous region described by a set of closed linestrings.|
|MultiPolygon||A set of polygons.|
|GeometryCollection||A collection of geometry types.|
To construct a geography type with one or more vector objects, you first declare the geography type in your T-SQL script, as shown in Figure 11. You then call one of the methods listed in Figure 12 and pass in the character string for your vector object and the Spatial Reference ID (SRID). The SRID is the spatial reference identification system, defined by the European Petroleum Survey Group. It is part of a set of standards developed for cartography, surveying, and geodetic data storage. Each SRID identifies a specific type of ellipsoid to use in geography calculations. This is necessary because the earth is not a perfect sphere. SQL Server 2008 can only perform calculations on identical SRIDs.
Figure 12 Construct objects for geography and geometry
|STGeomFromText||Constructs any type of geography instance from input text.|
|STPointFromText||Constructs a geography Point instance from the input text.|
|STMPointFromText||Constructs a geography MultiPoint instance from input text.|
|STLineFromText||Constructs a geography LineString instance from input text.|
|STMLineFromText||Constructs a geography MultiLineString instance from input text.|
|STPolyFromText||Constructs a geography Polygon instance from input text.|
|STMPolyFromText||Constructs a geography MultiPolygon instance from input text.|
|STGeomCollFromText||Constructs a geography GeometryCollection instance from input text.|
Figure 11 Create points, lines, and polygon geometry
DECLARE @geo1 geometry SELECT @geo1 = geometry::STGeomFromText('POINT (3 4)', 0) PRINT @geo1.ToString() DECLARE @geo2 geometry SELECT @geo2 = geometry::Parse('POINT(3 4 7 2.5)') PRINT @geo2.STX; PRINT @geo2.STY; PRINT @geo2.Z; PRINT @geo2.M; DECLARE @geo3 geography; SELECT @geo3 = geography::STGeomFromText( 'LINESTRING(47.656 -122.360, 47.656 -122.343)', 4326); SELECT @geo3.ToString(); --Results --POINT (3 4) --3 --4 --7 --2.5 DECLARE @gx geometry; SET @gx = geometry::STPolyFromText( 'POLYGON ((5 5, 10 5, 10 10, 5 5))', 0); PRINT @gx.ToString(); --Results --POLYGON ((5 5, 10 5, 10 10, 5 5))
Differences between Geography and Geometry
The geography and geometry data types are meant for working with different types of data, so there are some key differences to be aware of. With the geometry data type, distances and areas are given in the same unit of measurement as the coordinates of the instances. For example, the distance between the points (0,0) and (6,8) will always be 10 units. This is not the case with the geography type, which works with ellipsoidal coordinates that are expressed in degrees of latitude and longitude.
The GEOMETRY data type returns incoherent results when coordinates are expressed as latitude longitude pairs. The following T-SQL code calculates the distance between the POINTS (90 0) and (90 180). Both of these points refer to the North Pole, so the distance between them should be 0. In the GEOMETRY case the calculated distance is 180.
DECLARE @g1 GEOMETRY, @g2 GEOMETRY, @g3 GEOGRAPHY, @g4 GEOGRAPHY SELECT @g1 = GEOMETRY::STGeomFromText('POINT (90 0)', 0) SELECT @g2 = GEOMETRY::STGeomFromText('POINT (90 180)', 0) SELECT @g3 = GEOGRAPHY::STGeomFromText('POINT (90 0)', 4326) SELECT @g4 = GEOGRAPHY::STGeomFromText('POINT (90 180)', 4326) SELECT @g2.STDistance(@g1) AS 'GEOMETRY', @g4.STDistance(@g3) AS 'GEOGRAPHY'; --Results --GEOMETRY GEOGRAPHY ------------------------ ---------------------- --180 0
The orientation of spatial data is different for the two data types as well. In the planar system used by the geometry data type, orientation of the polygon is not an important factor. For example, a polygon with the coordinates ((0, 0), (10, 0), (0, 20), (0, 0)) is considered the same as the polygon ((0, 0), (0, 20), (10, 0), (0, 0)). But in the data model used by the geography data type, a polygon is unclear without a specified orientation. Consider, for example, a ring around the equator. Does the polygon described by this ring refer to the northern or southern hemisphere? The point here is that when working with geography data, the orientation and the location must be accurately described.
There are also some restrictions placed on the geography data type by SQL Server 2008. For example, each geography instance must fit inside a single hemisphere. Larger spatial objects are not allowed and will cause an ArgumentException to be thrown. Geography data types requiring two inputs will return NULL if the results from the methods do not fit inside a single hemisphere.
SQL Server provides a number of methods that allow operations to be performed on geography and geometry instances. Figure 13 shows some examples using methods that SQL Server 2008 provides to work with spatial data. Due to space constraints, I cannot go into more detail on this topic, but you can find full descriptions in SQL Server Books Online.
Figure 13 Working with spatial data
DECLARE @gm geometry; DECLARE @gg geography; DECLARE @h geography; SET @gm = geometry::STGeomFromText('POLYGON((0 0, 13 0, 3 3, 0 13, 0 0),(2 2, 2 1, 1 1, 1 2, 2 2))', 0); SELECT @gm.STArea(); --Results --38 SET @gg = geography::STGeomFromText('LINESTRING(0 0, 5 5)', 4326); --Calculate the distance to a point slightly offset from the LINESTRING. SET @h = geography::STGeomFromText('POINT(4 4)', 4326); SELECT @gg.STDistance(@h); --Results -- 430.182777043046 --Calculate the distance to a point on the LINESTRING. SET @h = geography::STGeomFromText('POINT(5 5)', 4326); SELECT @gg.STDistance(@h); --Results -- 0 DECLARE @temp table ([name] varchar(10), [geom] geography); INSERT INTO @temp values ('Point', geography::STGeomFromText('POINT( 5 10)', 4326)); INSERT INTO @temp values ('LineString', geography::STGeomFromText( 'LINESTRING(13 5, 50 25)', 4326)); --Calculate the distance to a point on the LINESTRING. --Display the number of dimensions for a geography object stored in a --table variable. INSERT INTO @temp values ('Polygon', geography::STGeomFromText( 'POLYGON((47.653 -122.358, 47.649 -122.348, 47.658 -122.348, 47.658 -122.358, 47.653 -122.358))', 4326)); SELECT [name], [geom].STDimension() as [dim] FROM @temp; --Results --name dim ------------ ----------- --Point 0 --LineString 1 --Polygon 2
I hope this information on the seven new data types in SQL Server 2008 has been helpful.
Kelly Wilson has been in software engineering for more than 20 years. Her credits include applications in SQL Server, 3D graphics, gaming, and color science. Kelly is currently a Programming Writer with the SQL Server group at Microsoft.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.