SQL Q&A: In search of performance
Reducing workload and smoothing mirroring functions is always a good idea—but shrinking databases is not.
Paul S. Randal
Q. I’m working with a team of developers who are changing an application to use SQL Server for data storage. The data was previously stored locally on client machines. Can you give me a list of considerations for the developers so they can drive the least-possible amount of workload to SQL Server?
A. By striving to make the application call down to the data tier as little as possible, you’re taking an excellent approach. Focusing on the application is unfortunately atypical. The primary thing to bear in mind is to have the application retrieve data from SQL Server as little as possible. When it does retrieve data, have it only retrieve the data it needs as efficiently as possible.
Here are some things for your developers to consider about the manner in which the application queries SQL Server data. Paying attention to these will avoid unnecessary workload and negative impact on the CPU, memory and I/O:
- Processing: For the data that’s being pulled from SQL Server, the application should avoid processing the data one row at a time. This is commonly called RBAR, or row-by-agonizing-row, processing. Any time SQL Server sends data to the application, it has a thread waiting for the application to acknowledge the data sent through. RBAR processing can lead to high ASYNC_NETWORK_IO waits in SQL Server. The application should cache incoming data locally, and quickly reply back to SQL Server that it has the data.
- Filtering: The application should avoid filtering data locally before using or displaying that data. It’s much more efficient to push the filter predicate down to SQL Server and have the minimum amount of data returned to the application. SQL Server is very good at filtering data, given the right non-clustered indexes to support the filter predicates.
- One Size Fits All (OSFA): Minimize the amount of table columns being returned to only those necessary. Developers should also avoid trying to build a “one size fits all” dialog. Using a targeted SELECT list instead of SELECT * will cut down on the amount of data being processed and returned. With fewer columns requested, SQL Server might also have more optimal ways of getting to this data, which would improve performance.
- Ordering: If the data being returned doesn’t need to be sorted with an ORDER BY, then avoid specifying ORDER BY as this may cut out a sort operation. Sort operations can often be expensive because they end up requiring a costly sort-spill to tempdb.
- Just in Case: Postpone SELECT operations until they’re really required. If an application is issuing a SELECT just in case, the user clicks an application button. Then it might be wasted processing. It’s better to wait until the button is actually pushed before issuing the SELECT, removing all processing when the button is not pushed.
- Consider Caching: If you’re querying the same data again and again, cache it locally and only issue a new SELECT when the data changes. This is ideal when data doesn’t change frequently or if you don’t require up-to-the-minute data.
Considering these factors can have a profound effect on the amount of work SQL Server has to do, especially if a single change in application query logic is multiplied by hundreds or thousands of instances of the application running simultaneously.
Ask your application development team to review how the application is using SQL Server. This could greatly benefit your existing workload. The root cause of performance problems is too often assumed to be SQL Server, instead of the way the application is using SQL Server.
Q. We’ve been using database mirroring for several years now. We have only recently had any problems. We performed a failover and the mirror database took several hours to come online, which was quite unexpected. Are there any performance counters we can monitor to tell whether this will occur again?
A. Database mirroring has become extremely popular since it was properly introduced in SQL Server 2005 SP1. However, there’s a pervasive problem on client systems. There seems to be an assumption that as soon as you implement database mirroring, you can safely forget it and rely on it to work perfectly when a failure occurs—it will always bring the protected database online on the mirror server with no loss of data and minimal downtime.
While this might be true in some cases, it’s a dangerous assumption. To reduce the potential for disaster, it’s absolutely essential to monitor the size of both the SEND and REDO queue of a mirroring session:
- The size of the SEND queue shows how much transaction log has been generated on the principal server, but hasn’t yet been sent to the mirror server. If it’s not zero, it means the mirroring state isn’t synchronized and there can’t be an automatic failover. Furthermore, the size of the SEND queue indicates the amount of data loss that will occur if the principal database suffers a disaster. You need to monitor this to ensure the size of the SEND queue doesn’t exceed your maximum allowable data loss Service Level Agreement (SLA)—or Recovery Point Objective (RPO)—for the database being mirrored.
- The REDO queue size shows how much transaction log exists in the mirror database that hasn’t yet been replayed on the mirror database. Remember, log records just have to be hardened—not replayed—on the mirror database’s log drive. That’s done as an ongoing process on the mirror server. If a mirroring failover occurs, you can’t access the mirror database until all transaction log records in the REDO queue have been replayed on the mirror database. This essentially means a crash recovery has to occur. The larger the REDO queue, the longer a failover will take. Remember that in the Enterprise Edition, fast recovery comes into play and the database becomes available after the REDO phase of recovery has completed, but before the UNDO phase begins. You need to monitor this to ensure the size of the REDO queue doesn’t exceed your maximum allowable downtime SLA—or Recovery Time Objective (RTO)—for the database being mirrored.
The oldest unsent transaction is another way to monitor the instantaneous amount of data loss you would suffer in the event of a principal database disaster. It applies in all modes of database mirroring, because even if you’re using synchronous mirroring, the principal and mirror can become disconnected or you might pause mirroring.
You can monitor the SEND and REDO queues using the Database Mirroring Monitor in SQL Server Management Studio to set alerts. You can also monitor them directly using the Database Mirroring perfmon object counters Log Send Queue KB and Redo Queue KB.
If you find the REDO queue size growing, this implies the mirror server can’t keep up with the amount of log being sent from the principal server. It could be there’s additional workload on the mirror server that’s preventing the mirror database log from replaying as fast as possible. It may also be the physical hardware on the mirror server isn’t as capable as that on the principal server.
Shrink it up
Q. One of our application vendors is mandating that we run regular database consistency checking (DBCC) SHRINKDATABASE operations against the application databases and tempdb. The vendor insists this is necessary to maintain proper performance. Can you give me some advice?
A. This question comes up pretty regularly. An application vendor might refuse to let you remove regular shrink operations because they’re deemed “necessary for performance.” Shrinking databases causes index fragmentation, consumes a lot of CPU and I/O resources. It also generates a lot of transaction log. This can cause problems for database mirroring, AlwaysOn Availability Groups, replication and anything else that has to ship log records around. There are some circumstances, however, where one-off shrink operations are necessary.
Databases should never be regularly shrunk. Regularly shrinking databases is a bad thing to do because if the database repeatedly grows after being shrunk, all that shrink work is completely wasted effort. It’s akin to having auto-shrink enabled for the database.
Many vendor application teams don’t know these things about shrink. This is often because they’ve ported the application from another database system, and are loath to listen to anyone trying to educate them about how SQL Server works.
I’ll occasionally get involved with a client and the application vendor team. The justifications from the application vendor team are usually along the lines of the following (paraphrasing):
- The indexes in the database are already fragmented, so shrinking doesn’t make it any worse.
- Nobody’s ever complained about performance before, so why are you?
- We have to have a regular shrink because the operations we do cause the database to expand a lot and customers want their disk space back.
- We have to shrink tempdb because the operations we do cause it to grow continually.
None of these are valid reasons for regularly shrinking databases. In fact, it’s documented in KB article 307487 that shrinking tempdb when there’s user activity can lead to tempdb corruption. Also, the “Working with Tempdb in SQL Server 2005” white paper (applicable to all versions) states that: “Shrinking files is not a recommended practice.”
Any time a vendor claims shrinking is necessary, it demonstrates either a fundamental misunderstanding of how you should manage SQL Server or a deficiency in the application’s behavior covered up through regular shrinking. The best way to engage with vendors who mandate regular shrinking is to point them to the Microsoft KB article or white paper. Then they can’t argue that they’re adhering to Microsoft best practices.
Unfortunately, there’s no way to prevent shrink operations if they’re vendor-mandated. Removing the shrink operation would void a support agreement. The best thing you could do is have a SQL Server Agent job that runs every 15 seconds looking for connections that are shrinking databases and then killing them. Killing a shrink operation won’t cause corruption or other problems. This approach might help you stay within the support agreement, while also preventing performance issues on your production server.