Lesson 1: Writing Database Queries
SQL Server 2005 provides several methods for extracting data from a database. You can use special operators to aggregate and combine data. In addition to the data available within the database, you can query remote or heterogeneous data using linked servers. This lesson will cover the different ways you can write database queries.
Write SELECT Queries
The SELECT statement is the primary means of querying a relational database. This is a Transact-SQL statement that can be simple or quite complex depending on which clauses are applied to the SELECT statement. The basic structure of this statement is as follows, in which those statements surrounded by square brackets are optional:
SELECT <comma-delimited list of expressions or column names>[INTO <table name>] FROM <tables, views, or linked servers> [WHERE <search condition(s)>] [GROUP BY <comma-delimited list of columns>] [HAVING <search condition(s)>] [ORDER BY <comma-delimited list of columns> <ASC or DESC>]
Notice that the only portions of the statement that are required are the SELECT and FROM clauses. The simplest SELECT statement would look something like the following:
SELECT * FROM Person.Contact
The previous query requested that all rows, due to the wildcard symbol (*), should be retrieved from the Person.Contact table. Most queries that you write will not be this simple, and there are many considerations that need to be made when designing complex queries.
One of the most frequently used operations in a SELECT statement is a join. Because most databases are normalized, it is often necessary to retrieve data from multiple tables. Normalization is the process in which data is separated into multiple related tables. Joins enable you to create a result set that is derived from one or more tables. A join relates tables based on a key column, such as a primary key or a foreign key. You want the column specified in your join clause to contain values common to both tables. For the AdventureWorks database, you can join the Person.Contact table to the HumanResources.Employee table to retrieve an employee’s title. For example, the following query returns the first name and last name from the Person.Contact table and then the title from the HumanResources.Employee table:
SELECT con.FirstName, con.LastName, emp.Title FROM Person.Contact con JOIN HumanResources.Employee emp ON con.ContactID = emp.ContactID
There are several types of joins, and which one is used depends on what data needs to be returned. Which join type is used can affect the number of rows that are returned. The different join types are as follows:
- INNER This is the default join type and is used if you do not specify a join type. It indicates that all matching rows from both tables should be returned. When this join type is used for the previous query 290 rows are returned.
- FULL In this case, you could have rows returned from either the right or left table that do not meet the join condition. When this happens, the table that does not meet the condition will return a null value for output columns. When this join type is used for the previous query 19,972 rows are returned, which is the number of rows in the Person.Contact table.
- LEFT In this case, you could have rows returned from the left table that do not meet the join condition. The rows from the left table will return null values. When this join type is used for the previous query, 19,972 rows are returned, which is the number of rows in the Person.Contact table.
- RIGHT In this case, you could have rows returned from the right table that do not meet the join condition. The rows from the right table will return null values. When this join type is used for the previous query, 290 rows are returned.
Sometimes, it might be necessary to use a subquery within your SELECT statement. Subqueries are nested queries that can be used within SELECT, UPDATE, INSERT, and DELETE statements. Typically, they are used in place of a column in the select list, but they can also be used in the WHERE or HAVING clauses. Subqueries can be nested several levels deep, up to 32 levels, but doing so would consume a lot of memory and would be hard to read and maintain. The subquery is always surrounded by parentheses.
In many cases, a subquery could be used in place of a join. For example, the following query could replace the one used previously to return the first name, last name, and title of all employees:
SELECT con.FirstName, con.LastName, (SELECT emp.Title FROM HumanResources.Employee emp WHERE con.ContactID = emp.ContactID) As Title FROM Person.Contact con
In this case, the query would return 19,972 rows and would function much the same as a LEFT or FULL join. When you join one or more tables, regardless of the method, you need to be careful to verify that you return the data that you expect. Just because the query executed successfully and returned results does not mean that it returned the data your application requires.
Using Linked Servers
In some cases, it might be necessary to retrieve data from heterogeneous data sources. Distributed queries enable you to access multiple data sources using the OLE DB data provider. This type of capability is important in large companies that might have data located in several different data sources. The data sources could reside on the same computer and in different instances of SQL Server, or they could reside on separate computers. The data source could be something other than a relational database, such as a Microsoft Office Excel spreadsheet. The data could also be from another type of database, such as Oracle or Sybase.
To access a remote or heterogeneous data source, you will first need to create a linked server. Once the linked server is defined, you can perform queries and updates against the data source. Linked servers can be created and accessed through Microsoft SQL Server Management Studio. Once you have connected to your SQL Server instance, you can expand the Server Objects and Linked Servers nodes. Any previously defined linked servers will be listed here. Within the Providers node is a list of several supported data providers. (See Figure 2-1.) Table 2-1 lists each of these data providers along with the data source with which it is associated.
Table 2-1 Data Providers Available for Linked Servers
To create a linked server, right-click the Linked Servers node and select New Linked Server. From the New Linked Server dialog box, you will configure the linked server. Figure 2-2 shows how to use the Microsoft.Jet.OLEDB.4.0 DB Provider to configure a linked server named Employees. In this case, the data is an Access database named Employees.mdb.
Once the linked server has been created, you will see the name appear below the Linked Servers node. At this point, you can execute queries against this data source using the New Query window in SQL Server Management Studio. For example, you could issue the following Transact-SQL command from a query window:
SELECT * FROM EMPLOYEES...Employee
This previous statement would return all records from the Employee table. The SELECT statement uses a four-part name, where the first part is the named of the linked server, and the last part is the table name. Because an Access database does not have catalog and schema names, you just use periods to represent those portions of the name.
How you structure the query will vary depending on the data source. For example, if you are querying an Oracle database as a linked server, you would still use a four-part name, but because Oracle has only one catalog per database instance, the structure of the query would look like the following:
Alternatively, you can create a linked server using Transact-SQL and the sp_addlinkedserver built-in stored procedure. The syntax used for this stored procedure is as follows:
sp_addlinkedserver [@server='Server Name'], [@srvproduct='Product Name'], [@provider='Provider Name'], [@datasrc='Data Source'], [@location='Location'], [@provstr='Provider String'], [@catalog='Catalog']
For example, if you had used sp_addlinkedserver to create the linked server named Employees, the Transact-SQL would have looked like the following:
sp_addlinkedserver 'EMPLOYEES', 'Access 2003', 'Microsoft.Jet.OLEDB.4.0', 'c:\Employees.mdb'
Using the PIVOT and UNPIVOT Operators
The PIVOT operator is one of the new Transact-SQL features available in SQL Server 2005. It enables you to generate an output table, which can then be used as input for a cross-tab report. The PIVOT operator is used in the FROM clause to manipulate one of the input-table expressions. The result is that one column might be aggregated and then returned as multiple columns in the output table.
The PIVOT operator provides an alternative method for aggregating data into multiple columns. Previously, you would have needed to use CASE statements to accomplish the same results. The following is an example of a Transact-SQL statement that uses the PIVOT statement to return a count of purchase orders by vendor for each year:
SELECT VendorID,  As '2001',  As '2002',  As '2003',  As '2004' FROM (SELECT VendorID, PurchaseOrderID, YEAR(orderdate) as ChangeYear FROM Purchasing.PurchaseOrderHeader) r PIVOT ( Count(r.PurchaseOrderID) FOR ChangeYear IN (, , , ) ) As Results Order By VendorId
The query should return 86 rows that are ordered by the VendorID. Table 2-2 shows the first five results.
Table 2-2 Query Results When Using the PIVOT Operator
Another operator, named UNPIVOT, does the opposite of the PIVOT statement. It will return results as rows instead of aggregated columns. This type of operator would be useful if you had results from a PIVOT operator stored in a table, and you needed to view the results differently. UNPIVOT does not return the data as it originally appeared, because the original data was aggregated, but it does return the data in an alternative format.
Using the APPLY Operator
The APPLY operator is also a new Transact-SQL feature in SQL Server 2005. Like the PIVOT operator, APPLY is also used in the FROM clause. However, it is used to apply a table-valued function to each row in the outer table. A table-valued function is just like an ordinary user-defined function, except that it returns a table as a result. The APPLY operator can be one of the following:
- OUTER APPLY Returns all rows that return a result set and will include NULL values in the columns that are returned from the table-valued function.
- CROSS APPLY Returns only rows from the outer table that produces a result set.
To understand how the APPLY operator works; assume you wanted to return the total due on purchase orders for all employees in a certain year. To use the CROSS APPLY operator to accomplish this, you would first create a table-valued function, such as the following:
CREATE FUNCTION fnGetPurchaseOrderTotal(@EmpID int, @Year varchar(4)) RETURNS @RetTable TABLE (TotalDue money) AS BEGIN WITH OrderTree(total) AS ( SELECT sum(totaldue) as OrderTotal FROM Purchasing.PurchaseOrderHeader WHERE EmployeeID = @EmpID AND YEAR(OrderDate) = @Year ) INSERT INTO @RetTable SELECT * FROM OrderTree RETURN END
You would then create a query that used the CROSS APPLY operator to join the results of the table-valued function. This would be done on a row-by-row basis. The following example shows how a query that needs to return employees hired in the year 2002 would look:
SELECT c.LastName + ', ' + c.FirstName as Employee, CONVERT(varchar, tot.TotalDue,1) as 'Total Due' FROM Person.Contact c JOIN HumanResources.Employee e ON c.ContactId = e.ContactID CROSS APPLY fnGetPurchaseOrderTotal(e.employeeid,2002) as tot WHERE tot.TotalDue IS NOT NULL ORDER BY tot.TotalDue desc
This query would return 12 results, which are ordered according to the Total Due, from largest to smallest, in Table 2-3.
Table 2-3 Query Results When Using the APPLY Operator
Using the EXCEPT and INTERSECT Operators
Prior to SQL Server 2005, the options for combining result sets included joins and the UNION operator. We reviewed the join statement earlier; the UNION operator can be used to combine the results of two or more SELECT statements if they have the same structure.
SQL Server 2005 offers two additional operators, EXCEPT and INTERSECT, that can be used to combine and limit result sets. These operators are used to compare the results of two or more SELECT statements and return values that are common to the two. The EXCEPT operator returns any distinct values from the left-side query, and the INTERSECT operator returns distinct values from both the left-side queries and right-side queries.
The queries that are compared must contain the same columns and structure in order to be comparable. But, this can be a useful way of joining two result sets based on distinct values. For example, the following query can be used to identify which products have at least one document assigned to them:
Select ProductID FROM Production.Product INTERSECT SELECT ProductID FROM Production.ProductDocument
This query will return 31 records even though there are 32 records in the Production.Product-Document table. This is because one of the products is associated with two documents, and this query will only return distinct values.
Using Ranking Functions
SQL Server 2005 offers four ranking functions that can be used to indicate where each row falls in the result sets ranking. This can be useful if you need to build an array or rank search results. The four functions are as follows:
- ROW_NUMBER Used to return the row number sequentially as it appears in the result set. The function can use both an order and partition clause. The ranking will start with the number 1.
- RANK Used to return the rank of each row within the partition of the result set. The function also uses an order and partition clause. The ranking will add one to the rank of the number of ranks that preceded it. With this function, you can have two or more rows that receive the same rank.
- DENSE_RANK Used to return the rank of each row within the partition of the result set. The function also uses an order and partition clause. The ranking will add one to the rank plus the distinct rows that preceded it, so the ranking will be sequential.
- NTILE Used to return data based on groups that are then numbered starting at one. This function accepts an integer that specifies the number of groups that each partition will be divided into.
The following is an example of the ROW_NUMBER function, which is used to return the row number for rows ordered by the contact’s last name:
SELECT e.EmployeeID, ROW_NUMBER() OVER(ORDER BY c.LastName) as RowNumber, c.FirstName, c.LastName, e.Title FROM HumanResources.Employee e JOIN Person.Contact c ON e.ContactID = c.ContactID
The top five results from this query are listed in Table 2-4.
Table 2-4 Query Results When Using the ROW_NUMBER Function
Using the COALESCE and ISNULL Functions
The COALESCE function can be used to return the first non-null value for one or more expressions. For example, the address in the AdventureWorks database is split into two fields: AddressLine1 and AddressLine2. If you wanted to return the address for all employees as one line instead, you could use the following Transact-SQL code:
SELECT ea.EmployeeID, COALESCE(addressline1 + ' ' + addressline2, addressline1, addressline2) as address FROM Person.Address a JOIN HumanResources.EmployeeAddress ea ON ea.addressid = a.AddressID ORDER BY ea.EmployeeID
In the AdventureWorks database, employee number 100 has the following values set in
AddressLine1 and AddressLine2.
- AddressLine1: 3029 Pastime Dr.
- AddressLine2: #2
For this record, a column named address would be returned as “3029 Pastime Dr. #2.” For all records in which AddressLine2 was set with a NULL value, the address column would contain the value from AddressLine1 only.
The ISNULL function is similar to the COALESCE function, and in some cases you could use ISNULL as a replacement for the COALESCE function. The ISNULL function is used to replace NULL values with a specific replacement value. Unlike the COALESCE function, ISNULL will only accept two arguments. The syntax for the function is as follows:
ISNULL (check_expression, replacement_value)
This function can be useful when there is only one column that contains a NULL value. For example, assume you wanted to update the Production.Product table in the AdventureWorks database. In this table, the color column contains several NULL values, because not all products are associated with a color. You might want to replace all NULL values with another value, “N/A,” to indicate that the color is not applicable. The following query could be used to accomplish this task:
UPDATE Production.Product SET Color = ISNULL(Color, 'N/A')
When working with NULL values, be careful to distinguish between NULLs and empty strings. You can easily identify NULL values in the query results window of SQL Server Management Studio. For example, the results window displayed in Figure 2-3 shows that the first product contains an empty string. The next four products contain NULL values for the color column. In this case, the UPDATE statement that used the ISNULL function would not have updated the color for the first product.
SQL Server 2005 provides many built-in functions that can be used to accomplish various tasks. For example, Table 2-5 lists all the built-in functions that can be used to aggregate your data.
Table 2-5 Built-in Aggregate Functions
These functions are typically embedded inside of Transact-SQL statements, such as the following query, which is used to return the average list price for all products with a class of ‘L’:
SELECT avg(listprice) FROM production.product WHERE class = 'L'
In addition to using the built-in functions provided with SQL Server 2005, you can create and use your own user-defined functions. Typically, user-defined functions are used to perform a complex task on one or more values and then return a result. You have already seen a user-defined function in the section about using the APPLY operator. In that case, the function returned a table as the result. This is known as a table-valued function. You can also create a scalar function, which is used to return a single value such as a string or an integer.
The following is an example of a scalar function that accepts the Product ID as an input parameter and returns a smallint data type. The function is used to determine the difference between a products inventory and the reorder point.
-- Add this check to the beginning that will -- drop the function if you are trying to replace it IF OBJECT_ID(N'dbo.fnGetProductDiff', N'FN') IS NOT NULL DROP FUNCTION dbo.fnGetProductDiff; GO -- This is the part that actually creates the function CREATE FUNCTION dbo.fnGetProductDiff -- function name (@ProdId int) -- input parameter name RETURNS smallint -- data type returned AS BEGIN -- begin function code -- First get the current quantity -- for this product, which may be in -- multiple locations DECLARE @qty smallint SET @qty = (SELECT SUM(quantity) FROM Production.ProductInventory WHERE ProductID = @ProdId) -- Now get the ReorderPoint for this Product -- Return either the difference as: -- negative, which means there is a shortage -- positive, which means there is no shortage -- zero, which indicates that the amounts are the same -- NULL would indicate that the product has no inventory DECLARE @point smallint SELECT @point = CASE WHEN ReorderPoint = @qty THEN 0 WHEN ReorderPoint <> @qty THEN @qty – ReorderPoint END FROM Production.Product WHERE ProductID = @ProdID RETURN @point END; -- end function code GO
Once the function has been created, you can reference the function in another query. For example, the following SELECT statement will return product numbers along with the difference between the inventory and reorder point:
SELECT ProductID, ProductNumber, dbo.fnGetProductDiff(ProductID) As 'Quantity Diff' FROM Production.Product
If the inventory level is equal to the reorder point, a value of zero will be returned. If the inventory level is below the reorder point, then a negative value reflecting the difference will be returned. If the inventory level is above the reorder point, then a positive value reflecting the difference will be returned. Finally, if no inventory exists for the product, then a NULL value will be returned.
Writing Full-Text Queries
Full-text queries enable you to go beyond the traditional capabilities of a text-based search. These queries go beyond looking for an exact match or even using the LIKE operator to see whether a string matches a specified pattern. Full-text searching enables you to search a portion of the column and look for partial matches of text. This is not the same thing as using the LIKE operator and the wildcard character.
Full-text searching enables you to look for a word or phrase that is close to the search word or phrase. You can also look for two words that are next to each other or multiple words and then rank the results according to specific weightings. Full-text searching uses indexes, which enable it to perform quickly when querying against a large amount of data. It utilizes the new full-text searching engine service named Microsoft Full-Text Engine for SQL Server (MSFT-ESQL). This section will not cover full-text administration, which involves creating full-text catalogs and indexes. For more information about that topic, see the MSDN article “Administering a Full-Text Search” at
The main predicates used in a full-text query are CONTAINS, FREETEXT, and CONTAINSTA-BLE. These predicates are used in the FROM portion of a Transact-SQL query. The CONTAINS predicate can be used to search a column for words, phrases, or prefixes that are near to the search word, derived from the search word, or a synonym of the search word. For example, each of the following SELECT statements will return results from the AdventureWorks database once a full-text catalog and index has been built:
-- Simple search that returns any comments that -- contain the words easy or comfortable SELECT comments FROM Production.ProductReview WHERE CONTAINS(comments, 'easy OR comfortable') -- Proximity term example that returns any comments -- that contain the word easy close to the word conditions SELECT comments FROM Production.ProductReview WHERE CONTAINS(comments, 'easy NEAR conditions') -- Generation term example that returns any comments -- that can be derived from the word bike, which includes -- biking, bicycle, etc. You could also replace the word -- INFLECTIONAL with THESAURUS, but then you would only -- return one record and not two SELECT comments FROM Production.ProductReview WHERE CONTAINS(comments, 'FORMSOF (INFLECTIONAL, bike)')
The FREETEXT predicate can be used to search a column by matching the meaning of the word and not the exact wording. The results from this type of query are a little less precise than if you used the CONTAINS predicate, but such a query can still be useful when the user is not sure what the exact wording will be. For example, each of the following SELECT statements can be used to return results:
-- FREETEXT example that returns any comments that contain -- words similar to praise and recommended SELECT comments FROM Production.ProductReview WHERE FREETEXT(comments, 'praise recommended')
The CONTAINSTABLE predicate is similar to the CONTAINS predicate, except that it returns a table that can be ranked according to weighted values. This can be useful if you need to return a result list to a user that is ranked accordingly. To return the ranking along with columns from the table that is being searched, you need to perform an INNER JOIN on the table that is returned. For example, the following query can be used to return not only the comments, but their ranking according to the weighted values assigned:
-- Weighted term example that returns any comments -- with the words easy or female, but will rank -- the results with the word female higher than the result -- with the word easy. This means you can display -- the higher-ranking items first in a result list SELECT pr.Comments, Results.RANK FROM Production.Productreview pr INNER JOIN CONTAINSTABLE(Production.ProductReview, comments, 'ISABOUT (easy weight(.2), female weight(.6))') AS Results ON pr.ProductReviewID = Results.[KEY] ORDER BY Results.RANK DESC
Lab: Writing Database Queries
In this lab, you will experiment with writing database queries. In Exercise 1, you will create and modify a SELECT statement that performs a join on multiple tables. In Exercise 2, you will use the APPLY operator to return the average cost for products with a product cost history record.
The completed lab is available in the \Labs\Chapter 02\Lab1 folder on the companion CD.
Exercise 1: Use the JOIN Operator
In this exercise, you will begin by executing a query that returns product information. To attain this information, it is necessary to join two other tables. You will execute the query and note the results. You will then modify the query, execute the query again, and compare the results to the first execution.
- Open SQL Server Management Studio.
- Connect to the instance of SQL Server 2005 that contains the AdventureWorks database.
- Select New Query.
Add the following code to the query window:
SELECT p.Name, p.ProductNumber, p.ListPrice, l.Name as Location, pin.Shelf, pin.Bin, pin.Quantity FROM Production.Product p JOIN Production.ProductInventory pin ON p.ProductID = pin.ProductID JOIN Production.Location l ON pin.LocationID = l.LocationID
Select the AdventureWorks database from the Available
Databases drop-down list box, and then click Execute. The results
window should display 1,069 records, which is the exact number of
records in the Production.ProductInventory table. The top 10
results from this query are listed in Table 2-6.
Table 2-6 Partial Query Results After Executing First SELECT Statement
Return to the query window, and replace the previous SELECT
statement with the following:
SELECT p.Name, p.ProductNumber, p.ListPrice, l.Name as Location, pin.Shelf, pin.Bin, pin.Quantity FROM Production.Product p LEFT JOIN Production.ProductInventory pin ON p.ProductID = pin.ProductID LEFT JOIN Production.Location l ON pin.LocationID = l.LocationID
- Select the AdventureWorks database from the Available Databases drop-down list box, and then click Execute. The only difference between this query and the previous one is that we are now performing left joins instead of inner joins. Inner joins, which is the join performed when no join type is specified, will return data that matches the join condition for both the left and right tables. Left joins can return records from the left-hand table that do not meet the join condition. If you were to execute this query, you would have 1,141 records returned instead of 1,069. This is because there are some products that have no product inventory associated with them.
- This exercise demonstrates the importance of using the correct join type when joining multiple tables. Depending on what your query goals are, either query could be correct. If you wanted to return results for products with no inventory, then the second query would be correct. If you did not want to return records for products with no inventory, then the first query would be correct.
Exercise 2: Use the APPLY Operator
In this exercise, you will begin by creating a table-valued function that returns a table as the result. The table will contain the average cost for a particular product because there can be more than one record per product. You will then execute a query that uses the APPLY operator to join the results of the table-valued function.
- Open SQL Server Management Studio.
- Connect to the instance of SQL Server 2005 that contains the AdventureWorks database.
- Select New Query.
Add the following code to the query window:
CREATE FUNCTION fnGetAvgCost(@ProdID int) RETURNS @RetTable TABLE (AvgCost money) AS BEGIN WITH Product(stdcost) AS ( SELECT avg(standardcost) as AvgCost FROM Production.ProductCostHistory WHERE ProductID = @ProdID ) INSERT INTO @RetTable SELECT * FROM Product RETURN END
- Select the AdventureWorks database from the Available Databases drop-down list box, and then click Execute. Ensure that the command was executed successfully.
Replace the code in the query window with the following query and
SELECT p.[Name], p.ProductNumber, CONVERT(varchar, cost.AvgCost,1) as 'Average Cost' FROM Production.Product p CROSS APPLY fnGetAvgCost(p.ProductID) as cost WHERE cost.AvgCost IS NOT NULL ORDER BY cost.AvgCost desc
- The query should return 293 results, which represent each distinct record in the Produc-tion.ProductCostHistory table.
- Which join type will return matching rows from both the right and left tables?
- How would you retrieve data from a data source other than SQL Server?
- What new operator for SQL Server 2005 can be used to create cross-table reports?
- What does the APPLY operator enable you to accomplish?
- What operator(s), excluding a table join, can be used to combine and limit result sets?
- What function can be used to return the first non-null values from more than one expression?
- What is the difference between a table-valued function and a scalar function?
- Which predicate can be used to search a column for words or phrases near to the search word?
Quick Check Answers
- An INNER join, which is the default join type if one is not specified, is used to return data that matches the join condition from both the right and left tables.
- If your data source is an OLE DB data source, then you can create a linked server. The linked server can then be referenced using a four-part name inside of a standard Transact-SQL statement.
- The PIVOT operator enables you to generate an output table. It can be used to replace the need to utilize CASE statements and aggregate functions to accomplish the same result.
- When used in the FROM clause of a SELECT statement, the APPLY operator can be used to apply a table-valued function to each row in an outer table. An OUTER APPLY will return all rows that include NULL values, and the CROSS APPLY operator will return rows from the outer table that provides a result set.
- The EXCEPT and INTERSECT operators can be used to combine and limit result sets. The EXCEPT operator returns distinct values from the left side. The INTERSECT operator returns distinct values from the left and right sides.
- The COALESCE function can be used to return the first non-null value from more than one expression. Alternatively, the ISNULL function only accepts two arguments and can be used to replace NULL values with a replacement value.
- A table-valued function is a user-defined function that returns a table, whereas a scalar function will return a single value, such as a string or an integer data type.
- CONTAINS and CONTAINSTABLE can use a proximity term and the NEAR keyword to return a word that resides close to the other one.
< Back Next >
© Microsoft. All Rights Reserved.