The Integer Problem

Article
04/05/2007

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Features
Problem Children
Everything your mother didn't tell you about SQL Server data types

Wayne Snyder

Sometimes things just don't work the way you expect. Have you ever walked toward a door, pushed hard to open it, and—pow!—slammed right into it? The door opened the other direction, of course. Some SQL Server data types—including integer, character, numeric, datetime, and uniqueidentifier—also behave in ways that might surprise you. Let's look at some of these behaviors and how you can keep them under control.

Let's say your company charges $2 per pound to deliver small packages. You store the package weight in ounces as an integer in your database. To calculate how much to charge a customer named Bob for delivering his 8-ounce package, you might use a SELECT statement like this:

  SELECT package_weight / 16 * $2.00

Bob should owe $1, right? You can use the following script to run a quick check of this answer without even creating a table:

  
DECLARE @package_weight int
SELECT @package_weight = 8  Bob's package weighed 8 ounces
SELECT @package_weight / 16  * $2.00

This code's result shows that Bob gets his package delivered for free (.0000). Now, increase the package weight to 16 ounces. At 16 ounces, Bob will owe your company $2; at less than 16 ounces, he'll owe nothing.

The problem here is that SQL Server keeps intermediate results in the same data type as the operands in an equation. In this case, both operands (8 and 16) are integers. Using integer math, 8 divided by 16 equals 0 (with a remainder of 8), and 0 multiplied by $2 is still 0.

If the two operands have differing data types, SQL Server converts the intermediate and final result to the higher precedence data type. Figure 1 shows the precedence order for SQL Server 2000 data types. For example, dividing an integer by a decimal results in a decimal because decimal is higher than integer in precedence. You can take advantage of this precedence in your delivery problem by changing @package_weight to a decimal:

  
DECLARE @package_weight decimal(6,2)
SELECT @package_weight = 8  Bob's package weighed 
8 ounces
SELECT @package_weight / 16  * $2.00

Now you get the answer you need to stay in business: Bob owes $1 for his 8-ounce package.

If you can't or don't want to change the package_weight column's data type, you can use the CAST() or CONVERT() function to convert the column to a decimal:

  
SELECT CONVERT(decimal(6,2),package_weight) / 16 * $2.00 FROM packages

  
SELECT CAST(package_weight AS decimal(6,2)) / 16 * $2.00 FROM
packages

Alternatively, you can use a shorthand way to get the correct answer to the problem. Perhaps you've seen some mystery code that looks like this:

  
SELECT (package_weight * 1.) / 16  * $2.00 FROM ...

Multiplying package_weight by 1. yields a decimal result. Note that the period after the 1 is required to define a number as a decimal. Or, because the second operator in this example is a literal, you could merely add a decimal point after the second operator:

  
SELECT package_weight / 16. * $2.00

Either way works fine, as long as one of the operands in the division equation can contain a fraction.

The LIKE Clause Problem

The LIKE clause is a wonderful string-search mechanism that most of us have used with great success. But every once in a while, the "LIKE clause problem" shows up in the SQL Server newsgroups. To demonstrate, let's say your users need to perform string searches against the title_id field in the Pubs database. They need the complete functionality that the LIKE clause provides, including the ability to use wildcards. Entering the following SELECT statement against the Pubs database shows that some title_ids begin with the letter P:

  
SELECT * FROM titles WHERE title_id LIKE 'P%'

Now, let's duplicate the LIKE clause problem. Declare a local variable to accept the value the user types in, then use the local variable with the LIKE clause in your SELECT statement:

  
DECLARE @i char(10)
SELECT @i = 'P%'
SELECT * FROM titles WHERE title_id LIKE @i

When you execute the query, you get no records in your resultset. You know the records exist, so what's the problem?

Char columns are fixed-length columns. When you store a shorter string than the column can hold, SQL Server automatically pads the rest of the column with spaces. So SQL Server interprets the SELECT statement above as if you had entered

  
SELECT * FROM titles WHERE title_id LIKE 'P%      '

This SELECT statement asks for title_ids that begin with 'P' and end with eight spaces—and of course, none exist. The solution is simply to use varchar columns for local variables that you use with the LIKE clause. The following code snippet changes the data type of @i from char to varchar and solves the problem:

  
DECLARE @i varchar(10)
SELECT @i = 'P%'
SELECT * FROM titles WHERE title_id LIKE @i

I recommend that you always use varchar local variables or parameters to contain strings you use with the LIKE clause.

Using Approximate Numerics

Approximate numerics can store both very large and very small numbers, so approximate data types can be useful for scientific calculations. SQL Server has two approximate numeric data types: float and real. Float complies with the ANSI SQL-92 standard for approximate numerics and can contain values in the range of -1.79 * 10³⁰⁸ to 1.79 *10³⁰⁸. The definition of a float data type is float(n), where the value of n is a number between 1 and 53 that defines the number of bits SQL Server uses to store the mantissa and to how many decimal places. An n value of 24 or less signifies 4 bytes of storage and 7 significant digits; an n value of more than 24 signifies 8 bytes of storage and 14 significant digits. The real data type is a synonym for float(24).

For many values, these data types don't store the value itself but rather a close approximation of the value. SQL Server rounds values up to the least significant digit. Here's a quick rounding test. Sending the query

  
DECLARE @i float
SELECT @i = 1.1
SELECT @i

to SQL Server returns the following results:

  
------------------------------
1.1000000000000001
(1 row(s) affected)

This example shows that the value SQL Server stored in the float local variable is an approximation. If you use approximate numerics in WHERE clauses, you might not receive the records you need because the stored values aren't what you expect. I've seen applications that use approximate numerics for WHERE clause lookups in which the designers tried to code around the problem. For each column, a record in another table stored how close the value had to be to the intended value for the record to be selected. All the SELECT statements first fetched these boundaries into local variables and included SELECT statements like this one:

  
SELECT * FROM TABLE WHERE approx_numeric_column BETWEEN (@targetval
 - @lowerbound) AND (@targetval + @upperbound)

Because of all that extra work, these applications had serious performance problems. So I recommend that, in general, you don't use local variables or database columns that are approximate numerics in WHERE clauses. At the very least, don't use the = operator with them because the values will be close but not equal.

Dates and Times

Datetime and smalldatetime data types and date expressions always include both the date and the time. If you supply one (but not both), SQL Server uses the default value for the missing part. The default time is midnight (00:00), and the default date is January 1, 1900. (For more details about the internal storage mechanism for datetime columns, see Kalen Delaney, Inside SQL Server, "Solving the Datetime Mystery," September 2000.)

The fact that this data type always includes both date and time can present some surprises when your query includes both dates and times. For example, you might have an orders table like the following:

  
CREATE TABLE orders
(id int identity(1,1) NOT NULL,
ord_dt datetime NOT NULL ...

As users enter orders into the orders table, SQL Server uses the getdate() function to store the current server date and time in each record. If you wanted to see all of today's orders, you might try sending the following query:

  
SELECT * FROM orders WHERE ord_dt = getdate()

But this query returns no records unless an order was placed at exactly the same date and time (down to the millisecond) that SQL Server optimized the SELECT statement. Next, you try

  
SELECT * FROM orders WHERE ord_dt = 'March 11, 2001'

This query also returns no records. Because you didn't provide the time for the datetime expression on the right side of the = operator, SQL Server used the default time, interpreting the query as

  
SELECT * FROM orders WHERE ord_dt = 'March 11, 2001 00:00:00.000'

This query finds all the orders that were placed at midnight on March 11—none.

One way you can get the data you need is to use the BETWEEN operator, as follows:

  
SELECT * FROM orders WHERE ord_dt BETWEEN 'March 11, 2001' AND 'March 12, 2001'

However, if any orders were placed exactly at midnight on March 12, SQL Server would include them as well, but you want orders for only the 11th. The following query does exactly what you want:

  
SELECT * FROM orders WHERE ord_dt >= 'March 11, 2001'
  AND ord_dt < 'March 12, 2001'

If the order time isn't important, you could use the CONVERT() function to force the time value to always be midnight, as the following code snippet shows:

  
DECLARE @tm datetime
SELECT @tm = convert(varchar(30),getdate(),107)
SELECT @tm

Forcing the stored time to midnight would let queries such as the following work as you'd expect:

  
SELECT * FROM orders WHERE ord_dt = 'March 11, 2001'

You might run across another common problem with datetime data types. In this example, the order time matters. Your company has three shifts, with first-shift hours between 7:00 a.m. and 3:00 p.m., and you want to find all orders placed during the first shift from March 7 through March 11. The query

  
SELECT * FROM orders WHERE ord_dt >= 'March 7, 2001 
7 am'  AND  ord_dt < 'March 11, 2001 3 pm'

isn't what you want because it returns data from all shifts between the beginning datetime and ending datetime. Alternatively, you could write a query that includes each day:

  
SELECT * FROM orders
WHERE ord_dt >= 'March 7, 2001 7 am'  AND  ord_dt 
< 'March 7, 2001 3 pm'
OR ord_dt >= 'March 8, 2001 7 am'  AND  ord_dt 
< 'March 8, 2001 3 pm'
OR ord_dt >= 'March 9, 2001 7 am'  AND  ord_dt 
< 'March 9, 2001 3 pm'
...

This solution might be practical for a week's data, but for monthly or annual shift reporting, it would quickly become unwieldy. You could also try to pick the pieces out of the datetime fields, as follows:

  
SELECT * FROM orders
WHERE datepart(yyyy,ord_dt) = 2001
AND     datepart(mm, ord_dt) = 3
AND     datepart(dd, ord_dt) BETWEEN 7 AND 11
AND     datepart(hh, ord_dt) BETWEEN 7 AND 14

The datepart query would return the correct answer, but it might cause performance problems. Avoid using any functions around the column names in a WHERE clause because using functions with a column in a WHERE clause prevents the column from being a search argument. The query optimizer might not be able to see the index statistics, so it would have to use default statistics values to determine whether a given index would be useful. Because SQL Server might not choose the fastest index to satisfy the query, let's keep looking for a better solution.

You could add a column to store the shift number, but sometimes shift hours change. I usually add another column to the table to store the time, which makes queries such as this one much simpler. Let's call the column ord_time and make it an integer. It will store the time of the order in the format hhmm (hours and minutes, using 24-hour time). The ord_dt column still has both the date and time, but this extra column will store the time only. (You can use a trigger to keep this denormalized column up-to-date.) Now your code can read

  
SELECT * FROM orders
WHERE (ord_dt >= 'March 7, 2001 7 am'  AND  ord_dt 
< 'March 11, 2001 3 pm' )
AND     (ord_time >= 0700 AND ord_time < 1500)

Early in a project's design phase, think about how reporting will be done. Often, people don't discover the datetime problem until they begin preparing the reports—at the end of the project, when they believe they're almost finished. So plan ahead for this eventuality.

Extra Use for Uniqueidentifier

Microsoft added the uniqueidentifier data type in SQL Server 7.0. You probably know that you can obtain uniqueidentifier values by using the NEWID() function. But you might not be aware that you can use a uniqueidentifier's column values to return records in a semirandom (quick and dirty) order. If you run the script that Listing 1 shows, the records come out sorted in uniqueidentifier column order, which means SQL Server returns them in the same order every time. This approach might be good enough for a one-time job, but it isn't random. If you need to return the records in a different order each time, see "Tips from the SQL Server MVPs" at http://www.sqlmag.com, InstantDoc ID 19842. Dejan Sarka shows a method to get (pseudo) random record returns even if a table doesn't have a uniqueidentifier column:

  
SELECT * FROM Authors ORDER BY NEWID()

Unlike when you use the script in Listing 1, the record order is different every time you use the NEWID() function directly in the ORDER BY clause.

Although these data type problems aren't earth-shattering, you can still waste time trying to figure out why things don't work as you expect. If you've discovered other data-type idiosyncrasies, forward them to me at wsnyder@ikon.com so that I can share them with others.

Bugs, comments, suggestions | Legal | Privacy | Advertising

The Integer Problem

The LIKE Clause Problem

Using Approximate Numerics

Dates and Times

Extra Use for Uniqueidentifier

Additional resources