This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Have a Nice Date

Alexander Kozak

Some things are so much a part of our daily routines that we just take them for granted; they work reliably day after day, and we don't even think to wonder about what's going on behind the scenes. Occasionally, though, we're jolted out of our passive acquiescence, only to be enchanted by the underlying "nuts and bolts." In this article, Alexander Kozak describes some of his finds related to dates. (Subscribers may want to re-read Andrew Zanevsky's February 1997 column, "Year 2000, 2050, 2079...," included in this month's Download. Available online are Anthony Perkins' article "What Day is It?" [January 2000], Andrew's "Granting Wishes with UDF" [September 2000], Ron Talmage's "The Dangers of Millisecond Arithmetic" [March 2003], and Tom Moreau's "Calculating Business Days" [July 2001].)

Not too long ago, one of our project managers asked me if there was a SQL Server function to return just the date–without the time portion. I knew the answer, of course, because of all the times I've coded T-SQL workarounds like this:

  SELECT sampleDate = CONVERT(VARCHAR(50), Date1, 101)
FROM tbl_SampleDate

or this:

  SELECT sampleDate = CAST(DATEPART(mm, Date1) 
AS varchar(50)) + '/' + CAST(DATEPART(dd, Date1) 
AS varchar(50)) + '/' + CAST(DATEPART(yy, Date1) 
AS varchar(50))
FROM tbl_SampleDate

	It looks ugly and doesn't set any speed records, but it does do the job. However, if you need to do this frequently, or for Very Large Databases (VLDBs)–and we did–you're likely to start searching for ways to optimize the code.

Datetime internals

At the risk of boring you with basics, permit me a brief review of datetime internals. SQL Server's two data types related to date and time–smalldatetime and datetime–differ in precision, range, and required storage.

	The smalldatetime data type stores dates and times of day as two two-byte integers. The first two bytes store the number of days after January 1, 1900, and the other two bytes store the number of minutes since midnight. Dates range from January 1, 1900, through June 6, 2079, with accuracy to the minute, where smalldatetimes with 29.998 seconds or lower are rounded down and smalldatetimes with 29.999 seconds or higher are rounded up.

	Datetime values are stored internally by SQL Server as two four-byte integers, where the first four bytes store the number of days before or after the base date, January 1, 1900, and the other four bytes store the time of day represented as the number of milliseconds after midnight. Datetime stores date and time data from January 1, 1753, through December 31, 9999, to an accuracy of 1/300th of a second (equivalent to 3.33 milliseconds, or 0.00333 seconds), and values are rounded to increments of .000, .003, or .007 seconds.

	Nice, but why 3.33 ms?

	Designers of any programming language, T-SQL included, have to face the question of how they're going to deal with time. The fundamental time question can be divided into the standard set of subtasks:

  • How to store time
  • How to process time
  • How to read (extract) time
  • How to represent (show) time

	In addition, designers need to decide the smallest time interval the system will be able to recognize. Because SQL Server is advertised as being able to run on Intel Pentiums or compatible processors with a clock speed of at least 166 MHz, that's a constraint. So is the basic binary architecture of CPUs.

Enough already

By now you're probably anxious to see some cool new function that magically extracts the date (time) portion from a datetime data type. Well, here's the brutal truth: I didn't try to create such a function, because I believe that SQL Server already provides an efficient programming interface for dealing with date and time via the T-SQL functions. I did come up with a trick, however, that allows faster retrieval of datetime data.

	I used Dell PowerEdge 4600, 6450, and 6650 servers with different hardware configurations, running Windows 2000 Advanced Server and SQL Server 2000 Enterprise Edition to test the queries. I also tested the solution on a regular P4 desktop computer with 512MB of RAM and an IDE hard drive. All of the tests showed the better performance, when I used my method. (For simplicity's sake, the article will include only values from the Dell PowerEdge 6450 with dual 700 MHz CPUs and 2GB of RAM, running Windows 2000 Advanced Server and SQL Server 2000 Enterprise Edition. The database files are allocated on a SAN.)

	First, I ran the queries on a heap table, tblOrder_Heap, with 3,512,677 rows, 40 columns, and 2GB of allocated space. I wanted to find the number of transactions and total dollar amounts of the orders per each business day.

Query 1

  DBCC DROPCLEANBUFFERS
-- removes all database pages from memory
GO
DBCC FREEPROCCACHE
GO
SELECT COUNT(*), SUM(amt), [Business Day] = 
CONVERT(VARCHAR(100), TRANSACTION_DATE, 1)
FROM tblOrder_Heap
GROUP BY CONVERT(VARCHAR(100), TRANSACTION_DATE, 1)

	Note that I used the CONVERT function with style 1 to group by day. The execution time was 41 seconds.

Query 2

  -- DBCC code as above
SELECT COUNT(*), SUM(amt), [Business Day] = 
CAST(datepart(mm, transaction_date) AS VARCHAR(5)) 
+ '/' + CAST(datepart(dd, transaction_date) 
AS VARCHAR(5)) + '/' + CAST(datepart(yy, 
transaction_date) AS VARCHAR(5)) 
FROM tblOrder_Heap
GROUP BY CAST(datepart(mm, transaction_date) 
AS VARCHAR(5)) + '/' + CAST(datepart(dd, 
transaction_date) AS VARCHAR(5)) + '/' + 
CAST(datepart(yy, transaction_date) AS VARCHAR(5)) 

	The second query looks ugly, but it executed in a surprisingly fast 29 seconds. The execution plans for both queries were almost identical aside from estimated row size. I deduced that the second query was faster because it was using the date and time programming interface (functions), which in turn was using the internal structure of date and time.

Query 3

What would happen, I wondered, if I ran the query, grouping directly by days that are saved in the first two bytes for smalldatetime (or in the first four bytes for datetime)?

-- smalldatetime example
-- DBCC code as above
SELECT COUNT(*), SUM(amt), DayDiff = convert(int,
substring(convert(binary(4), transaction_date), 1, 2))
FROM tblOrder_Heap
GROUP BY convert(int, substring(convert(binary(4),
transaction_date), 1, 2))

	Query results were returned in a mere 14 seconds–three times faster than the first query!

	But there's something that I don't like about this query: two CONVERTs and one substring function. I also don't like the fact that I'd have to edit the query (changing binary(4) to binary(8) and the instances of 1, 2 to 1, 4) depending on whether I was using datetime or smalldatetime. (By the way, I tried to run the query without converting to integer, and it was a bit faster.)

Query 4

Next I wondered if I could somehow avoid dealing with binary data altogether. Why not just use DateDiff (dd, start date, end date)?

-- DBCC code as above
SELECT COUNT(*), SUM(amt), dayDiff = DATEDIFF(dd,
'Jan 01, 1900',TRANSACTION_DATE)
FROM tblOrder_Heap
GROUP BY DATEDIFF(dd, 'Jan 01, 1900',TRANSACTION_DATE)

or:

  --DBCC code as above
SELECT COUNT(*), SUM(amt), dayDiff  = DATEDIFF(dd,
TRANSACTION_DATE, getdate())
FROM tblOrder_Heap
GROUP BY DATEDIFF(dd,TRANSACTION_DATE, getdate())

or:

  --DBCC code as before
SELECT COUNT(*), SUM(amt), dayDiff  = DATEDIFF(dd, 
0,TRANSACTION_DATE)
FROM tblOrder_Heap
GROUP BY DATEDIFF(dd, 0,TRANSACTION_DATE)

	All of the variants of Query 4 retrieved the result in 12 seconds!

Query 5

The problem with Query 4, of course, is that it returns the date as an integer. It's not immediately obvious to most of us that 37761 equates to May 22, 2003. Query 5 solved that problem with a final execution time of 13 seconds.

--DBCC code as before
SELECT COUNT(*), SUM(amt), [Business Day] =
CONVERT(DATETIME, DATEDIFF(dd, 0,TRANSACTION_DATE), 1)
FROM tblOrder_Heap
GROUP BY CONVERT(DATETIME, DATEDIFF(dd, 0,
TRANSACTION_DATE), 1)

	I ran the same queries for the table tblOrder_Clustered with exactly the same number of rows and columns. The only difference was that the table tblOrder_Clustered had a clustered index orderID and 10 nonclustered indexes (among them an index on the transaction_date column). It turned out that the execution times were approximately the same as for a heap, illustrating the fact that the extents scan for the heap tables, using IAM (Index Allocation Map)–with results that are generally equal to, but sometimes even faster than (especially when you apply the aggregate functions to whole table) a Clustered Index Scan. Table 1 shows the execution times for the table with exactly the same structure, but with 20,008,704 rows and 10GB of allocated space.

Table 1. Results of queries with 20,008,704 rows and 10GB of allocated space.

Query

Time

Query 1

4:08 min.

Query 2

2:58 min.

Query 3

1:35 min.

Query 4

1:24 min.

Query 5

1:25 min.

Conclusions

I could show you other examples using joins, grouping by minutes, and so on. But I hope this article will inspire you to experiment for yourself.

	I realize there are drawbacks to datetime internals usage as described here. It isn't ANSI-SQL compliant. Still, I can certainly recommend it for use in data warehouses or decision support systems, especially for extracting, transformation, and loading (ETL) processes that run on SQL Server.

Download 405KOZAK.ZIP

When Editor Karen Watterson pointed out that I hadn't really answered my own rhetorical question about why SQL Server's smallest clock tick is 3.33 ms, I explained the calculations the SQL Server team must have made in coming up with the number. Here's what I said:

	We have four bytes to store days before or after January 1, 1900, and four more bytes for 3.33 ms clock ticks since midnight. To represent 24 hours in milliseconds, we need 24*60*60*1000 = 86,400,000, but we can store a number as large as 2^32 = 4,294,967,296 in our four storage bytes. Conclusion: The data storage structure isn't the limiting factor. Similarly, a CPU rated at 166 MHz should be sufficient to deliver nanosecond range precision.

	So it must be the demands of the software (Windows and SQL Server) that led the team to limit SQL Server's precision to 3.33 ms. For example, Windows, with potentially dozens of processes and hundreds of threads running at the same time, can produce thousands of context switches per second. In terms of time, that means that you won't be able to work with 1 ms or higher precision. Try this:

  declare @i int
set @i = 1
while(@i < 10)
begin
   print  convert(varchar(100), getdate(),121)
   set @i = @i + 1
end

&#9;The result will be something like this:

     2004-03-20 16:44:33.450
   2004-03-20 16:44:33.450
   ...
   2004-03-20 16:44:33.450
   2004-03-20 16:44:33.450

&#9;My SQL Profiler run indicates that loop/batch only took 0 ms. What? Does it mean that the loop spends 0 ms to produce the result? No. It just means that the batch execution time was so short that it wasn't calculable.

&#9;If you increase the maximum value for the @I variable in your While loop to @I < 50, @I < 100, @l < 150, and so on (running each batch five or six times) you'll see that the duration remains 0 ms until you hit @l < 150. On my system, I suddenly started seeing 13 ms or 16 ms in some of my @l < 150 batches. In your 150 printed rows in Query Analyzer, you'll probably see the same results for a finite number of rows, after which you'll see a jump of 13 or 17 ms (in my case, from 104 2004-03-20 20:36:45.450 to 150 2004-03-20 20:36:45.467). So what does this mean? It means that the SQL Server/Windows clock tick is equal to something between 13 and 17 ms, evidently depending on how close the loop's start is to the beginning of the clock tick. (This also explains how you can get different execution times for the same code.) I should note here that the Windows clock tick depends on many factors, but that 1 ms (or less) is too small a clock tick for Windows. And we can't forget I/O operations, either; they depend on disk speed, and each disk has latency.

&#9;BOL informs us that datetime has an accuracy of one three-hundredths of a second (3.33 ms) and that values are rounded to increments of .000, .003, or .007 seconds. In fact, we can consider datetime as a kind of float (think Days.ClockTicks) that needs rounding. So you can't have 2, 4, 6, or 8 as a clock tick, because it's impossible to round to an even number. That leaves 3, 5, and 7 as good candidates for the clock tick; 3 gives the best precision, so if hardware and the OS are able to catch 3 ms, it would the best choice for the smallest time unit.

To find out more about SQL Server Professional and Pinnacle Publishing, visit their Web site at http://www.pinpub.com/

Note: This is not a Microsoft Corporation Web site. Microsoft is not responsible for its content.

This article is reproduced from the May 2004 issue of SQL Server Professional. Copyright 2004, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.