SQL Server Connection Basics
Summary: SQL Server 2000 clients connect to SQL Server using a stack of APIs, object libraries, and protocols. Ken Henderson walks through each one and explains how they work and interoperate. (7 printed pages)
The top of the stack is the API or object library layer. Applications connect to Microsoft® SQL Server through either API functions or interfaces exposed by an object library. Examples of APIs used to access SQL Server include ODBC and DB-Library. Examples of object libraries used to access SQL Server include OLE DB, ADO, and ADO.NET. Because ADO ultimately uses OLE DB to communicate with the server, there are really just two object libraries commonly used by Windows applications to communicate with SQL Server: OLE DB and ADO.NET. Connecting through ADO or ADO.NET is certainly more common than doing so over ODBC (although SQL Server's Query Analyzer and Enterprise Manager still connect over ODBC), so I'll discuss the client-side of SQL Server's connection architecture in terms of ADO/OLE DB and ADO.NET. Most apps these days get to SQL Server by way of an object library rather than ODBC or something similar.
OLE DB clients (also known as consumers) communicate with servers and other back-ends by means of a client-side provider. This provider is a set of one or more COM components that translate application requests into network interprocess communication (IPC) requests. In the case of SQL Server, the OLE DB provider that is most commonly used is SQLOLEDB, the Microsoft-provided OLE DB provider for SQL Server. SQLOLEDB comes with SQL Server and is installed as part of the Microsoft Data Access Components (MDAC) stack.
Applications that communicate with SQL Server using ADO do so by first establishing a connection to the server using a Connection object. ADO's Connection object accepts a connection string that specifies the OLE DB provider to be used as well as the parameters to pass to it. You'll see "SQLOLEDB" in this string if an app is connecting to SQL Server using the SQLOLEDB provider.
It's also possible for an ADO application to connect over ODBC to SQL Server. To do this, the app uses the OLE DB provider for ODBC and specifies an ODBC data source that references the target SQL Server in its connection string. In this scenario, the application communicates with OLE DB, and the OLE DB provider for ODBC makes the appropriate ODBC API calls to talk to SQL Server.
ADO.NET applications typically connect to SQL Server using the .NET Framework Data Provider for SQL Server. This native provider allows ADO.NET objects to communicate directly with SQL Server. Typically, an application uses a SqlConnection object to establish a connection, then uses a SqlCommand object to send commands to the server and receive results back from it. The SqlDataAdapter and SqlDataReader classes are typically used in conjunction with SqlCommand to interact with SQL Server from managed code applications.
By making use of the OleDbConnection class, ADO.NET applications can also use the SQLOLEDB OLE DB provider to interact with SQL Server. And they can access SQL Server by way of ODBC through the OdbcConnection class. So, from managed code alone, you have three distinct ways of accessing SQL Server from an application. This is good to know from a troubleshooting standpoint because it can help you isolate connection-related problems you encounter to a particular data access layer or library.
The next layer down in the stack is the Net-Library. A Net-Library provides the conduit between the API or object library an application uses to communicate with SQL Server and the networking protocols used to exchange data with the network. SQL Server provides Net-Libraries for all major networking protocols. These libraries transparently handle sending requests from the client to the SQL Server and returning the server's responses to the client. You can configure which Net-Libraries are available on a particular client using SQL Server's Client Network Utility. Supported client-side protocols include TCP/IP, Named Pipes, NWLink, Multiprotocol (RPC), and a few others.
One Net-Library that's worth special mention here is the shared memory Net-Library. As the name suggests, this Net-Library uses Windows' shared memory facility to communicate between a SQL Server client and server. Naturally, this means that the client and server must reside on the same physical machine.
Because it is able to bypass the physical network stack, the shared memory Net-Library can be considerably faster than other Net-Libraries. Access to the shared memory region is protected by synchronization objects, so the speed of the communication between the client and server is constrained mainly by Windows' ability to signal and unsignal kernel objects and processes' ability to copy data to and from the shared memory region.
You can indicate that the shared memory Net-Library should be used by specifying either a period or (local) as your machine name when connecting. You can also prefix your machine\instance name with lpc: when connecting to indicate that you want to use the shared memory Net-Library.
Understand that, even when connecting to a SQL Server on the same machine, the shared memory Net-Library is not necessarily your best connection option. The directness of the connection between the client and server can limit its scalability in some situations. As with other elements in an application's overall architecture, you should always thoroughly test a given technology solution before assuming that it scales well or is faster than alternate approaches.
When a client connects, SQL Server's user mode scheduler (UMS) component assigns it to a particular scheduler. At startup, SQL Server creates a separate UMS scheduler for each CPU on the system. As clients connect to the server, they are assigned to the scheduler with the fewest number of connections. Once connected, a client never changes schedulers—it will remain on its assigned scheduler until it disconnects.
This has important implications for applications that establish multiple connections to the server. If an application is poorly designed or does not evenly distribute work across its connections, it's possible for the app to cause needless contention for CPU resources between some of its connections, while others remain virtually idle.
Say, for example, that at application establishes four connections to SQL Server that is running on a machine with two processors and that connections 1 and 3 end up on processor 0, while connections 2 and 4 end up on processor 1. If the lion's share of the app's work is carried out over connections 1 and 3, they will contend for CPU 0 while CPU 1 might remain virtually idle. In this situation, there's nothing the app can do but disconnect/reconnect some of its connections and hope that connections 1 and 3 end up on different CPUs (there's no way to specify processor affinity when connecting) or redistribute its workload across its connections such that they are more balanced. The latter is, of course, far preferable to the former.
SQL Server sets aside three packet buffers for every connection made from a client. Each buffer is sized according to the default network packet size specified by the sp_configure stored procedure. If the default network packet size is less than 8KB, the memory for these packets comes from SQL Server's buffer pool. If it's 8KB or larger, the memory is allocated from SQL Server's MemToLeave region.
It's worth noting that the default network packet size for the .NET Framework Data Provider for SQL Server is 8KB, so the buffers associated with managed code client connections typically come from SQL Server's MemToLeave region. This contrasts with classic ADO applications, where the default packet size is 4KB, and the buffers are allocated form the SQL Server buffer pool.
Once connected, client requests typically fall into one of two broad categories: language events and remote procedure calls. Although there are certainly others, most requests from a SQL Server client to a server consist of one of these two types. A language event is a batch of T-SQL sent from the client to the server. For example, if you call the Execute method of an ADO Command object whose CommandText property is set to a T-SQL query and whose CommandType property is set to adCmdText, the query is submitted to the server as a language event. Likewise, if you set CommandType to adCmdTable and call the Execute method, ADO will generate an internal query that selects all the columns in the table identified by the CommandText property and submit it to the server as a language event. On the other hand, if you set CommandType to adStoredProc, calling Execute will cause ADO to submit a remote procedure call request to the server to execute the stored procedure listed in the CommandText property.
Why do you care about whether you're submitting requests to the server as language events or RPCs? You care because RPCs, generally speaking, perform better, especially when you're repeatedly calling the same query with different filter values. Although SQL Server can auto-parameterize plain language event requests, its ability to do so is pretty limited. It will not attempt to auto-parameterize certain types of queries at all. This can cause different executions of what is essentially the same query to incur the cost of plan compilation on the server simply because they filter on different values. Quite often, this is not what you want—you want to compile a new plan for the first execution of a query, then reuse the plan for subsequent executions that happen to feature different parameters.
An RPC, on the other hand, encourages plan reuse by explicitly parameterizing a query rather than relying on the server to do it. A single plan is generated for the first execution of the procedure, and subsequent executions automatically reuse it, even if they supply different values for the parameters. Calling a stored procedure using an RPC versus doing so through a language event not only saves the execution time and CPU resources required for plan compilation, it also makes better use of SQL Server's memory resources because it avoids wasting memory on redundant execution plans.
This is the same reason that sp_executesql is generally preferred to EXEC() when executing dynamic T-SQL. Sp_executesql works by creating a stored procedure using the specified query, then calling it using the supplied parameters. Unlike EXEC(), sp_executesql provides a mechanism that allows you to parameterize dynamic T-SQL and encourage plan reuse. A dynamic query that is executed using sp_executesql has a much better chance of avoiding unnecessary compilation and resource costs than one ran using EXEC().
RPCs, language events, and other types of requests sent from a client to SQL Server are formatted into a SQL Server-specific data format known as Tabular Data Stream (TDS). TDS is the "language" spoken between SQL Server clients and servers. Its exact format is no longer documented, but a client must speak TDS if it wishes to communicate with SQL Server.
Currently, SQL Server supports three versions of TDS: TDS 8.0 (for SQL 2000 clients), TDS 7.0 (for SQL Server 7.0 clients), and TDS 4.2 (for SQL Server 4.2, 6.0, and 6.5 clients). The only version that completely supports all SQL Server 2000 features is TDS 8.0. The others are maintained for backward compatibility.
On the server side, client requests are initially received by listeners SQL Server sets up to listen on particular networking protocols. These listeners consist of networking libraries on the server and the server-side Net-Libraries that provide a conduit between them and the server. You can configure the protocols on which the server listens using the Server Network Utility. Except when dealing with clusters, SQL Servers support the same range of networking protocols as is supported by clients. For clustered SQL Servers, only TCP/IP and Named Pipes are available.
SQL Server sets up one thread per networking protocol on which it listens for client requests, and uses Windows' I/O completion port mechanism to wait for and process requests efficiently. As TDS packets are received from the network, the Net-Library listener reassembles them into their original client requests and passes them on to SQL Server's command-processing layer, Open Data Services (ODS).
When the server is ready to return results for a particular client request, it uses the same network stack over which the request was initially received. It sends results over the server-side Net-Library to the appropriate networking protocol, and these, in turn, are sent back across the network to the client in TDS format.
On the client-side, the TDS packets received from the server are reassembled from the IPC layer by the client-side Net-Library, then forwarded on to the API or object library that initiated the request.
Despite all the pieces involved, the roundtrip between a SQL Server client and server can be quite fast—sub-second response time is not unusual at all, especially when working with the shared memory Net-Library. There are several data points here that are worth keeping in mind as you build and tune your own SQL Server client applications:
- If your app runs on the same machine as your SQL Server, consider using the shared memory Net-Library if you aren't already. Shared memory Net-Library-based connections are often considerably faster than other types of connections. Keep in mind what I said earlier, though: always thoroughly test a solution and compare it with viable alternatives before assuming that it is inherently better or faster. The proof is in the pudding.
- Because a client is assigned to a particular UMS scheduler when it first connects and will not leave that scheduler until it disconnects, it's important to ensure that an application's workload is balanced across the connections it establishes to the server. Unbalanced workloads can cause unnecessary CPU contention and suboptimal resource usage.
- The default network packet size you configure on the server and that clients specify when connecting directly affects how much memory they require on the server and the pool from which it is allocated. Keep this in mind as you configure servers for scalability and speed. Also keep in mind that, by default, ADO.NET apps will have a larger network packet size than ADO apps.
- Generally speaking, you should prefer RPCs to language events when sending requests to the server. Set the appropriate properties in the ADO or ADO.NET objects you're using to facilitate this.
- When executing dynamic T-SQL, use sp_executesql rather than EXEC() when possible. About the only time this isn't possible is when using EXEC()'s ability to concatenate query fragments into dynamic query strings that exceed what can be stored in a single local variable (a rare situation).
- When you run into client-side problems that you suspect may have to do with the object library or API you're using to reach the server, one troubleshooting technique you can use is to change the client-side mechanism you're using so that you can isolate the problem to a particular component. For example, let's say that you upgrade MDAC and begin seeing 17805 errors in your SQL Server error log indicating that malformed TDS packets are arriving from a client-side ADO application. You might try switching the app to use the OLE DB provider for ODBC, if you can do so without much trouble, to see whether the problem is related to the SQLOLEDB provider in some way. Conversely, if your ADO-based app has been connecting over ODBC, you might switch to SQLOLEDB to see if that remedies the problem or at least helps you narrow the scope.
- Along these same lines, it sometimes makes sense to change out the Net-Library you're using when troubleshooting connection-related problems. If you're using TCP/IP, perhaps Named Pipes would be worth trying. For example, if you're running into an issue with your DHCP server and don't have a valid IP address, you won't be able to connect to SQL Server using TCP/IP. By switching to Named Pipes, you can quickly isolate the problem to something specific to TCP/IP. On the other hand, if you switch Net Libraries and still have the same problem, you can probably rule out Net-Library-specific issues. Perhaps the server is down or a piece of the network infrastructure between you and the server is not functioning properly. If nothing else, being able to easily change the Net-Library an app uses without having to change the app itself gives you a tool for helping isolate problems. Even if a particular Net-Library isn't viable for you in the long term, temporarily switching a client to use it can help narrow down where a connection-related issue resides.
SQL Server for Developers
Ken Henderson is a husband and father living in suburban Dallas, Texas. He is the author of eight books on a variety of technology-related topics, including the recently released The Guru's Guide to SQL Server Architecture and Internals (Addison-Wesley, 2003). An avid Dallas Mavericks fan, Ken spends his spare time watching his kids grow up, playing sports, and gardening.