Preventing Reboots, Installing Multiple Updates, and More
Edited by Nancy Michell
Q How can I limit the number of reboots when applying hotfixes and the like to SQL Server and even the server operating system in general?
A First, you may be interested to know that most post-installation reboots are not hardcoded into the installer. They are usually the result of locked files. This means the installer is trying to update files that are currently in use (locked) by some other application or by the operating system. There is a straightforward approach to eliminating these reboots, which is to make sure no processes are using the files in question. How do you do that?
The first step is to determine the specific files that were in use and were locked during the installation. The easiest way is to look them up in the registry at HKLM\system\currentcontrolset\control\sessionmanager\pendingfilerenameoperations. Checking this registry key serves two purposes: it confirms that the reboot request was indeed caused by locked files, and it tells you which files were locked. If you look these up after the installation, but before the reboot, you may be able to figure out what caused the reboot requirement, and take steps to eliminate it during future installs.
You may encounter two types of entries. If a file was read-locked, you'll see a one-line-per-file entry that means the updated file is already on the disk but you need to clean up the temporary copy that's still in use. If a file was write-locked, you'll see a two-line-per-file entry that means the update hasn't happened yet, the application is in an unknown state, and should not be used. It's literally like stopping in the middle of setup. In this case, one line indicates the real file and the other points to a temporary file (which will become the real file on reboot).
The next step depends on many factors, but the idea is to identify what software is using the locked files. Use dependency checkers if you can, or the DLL HELP database on MSDN®
If the software you identify has a service, stop the service manually and then test again. If it's an application, close it and try again. Sometimes multiple applications are using a single file and you need to stop all of them.
The result of these tests (in your test environment) is a list of apps and services to stop in the production environment before installation. Thus the production environment will not require a reboot because no software will be holding locks on files that the installer is installing. Of course, remember to restart each of these once the installation is complete.
The nice thing about this is that locks generally don't change that much. So if you figure out this pattern for your systems once, it'll probably avoid a lot of reboots over the lifetime of your system.
There's one more trick worth mentioning. You may have multiple versions of SQL Server™ or another product installed side-by-side and these may even be at different version levels. You should always update the latest version first. Chances are good that the update will replace everything with the latest versions and all the other instances won't care about locking at all. For example, if you have three instances of SQL Server 2000 SP3, the first one you upgrade to SP4 will upgrade the MSXML3.dll file and need a reboot. Let's say you do that reboot now. You can then upgrade the next two instances to SP4 without needing to reboot, because the updated MSXML3.dll file is already present. The same is true for upgrading any SQL Server 2005 instances before SQL Server 2000 instances, and SQL Server 2000 before SQL Server 7.0 instances. This is a general reboot minimization strategy that really does work.
Looking at the big picture, Microsoft development teams have been doing a lot of work to stop or limit the number of reboots in many scenarios. Since this is the SQL Q&A column, let's take SQL Server as an example.
In SQL Server 2005, the SQL Server team split the Microsoft® Data Access Components (MDAC) code necessary for SQL Server out into SQLNCLI (new files) instead of updating MDAC itself, and moved MDAC back in-band with the OS. How does this help? Well, the Windows® operating system itself uses MDAC and keeps these files locked. That's why most SQL Server service packs and other packages always made you reboot. Now SQL Server can update its version of MDAC without touching the OS's in-use MDAC files, so you don't have to reboot.
Furthermore, there is now a dependency checker and pause built into the SQL Server 2005 SP1 setup. This shows you what's locked and what probably locked it, and gives you the chance right then and there to stop whatever's doing the locking and continue setup without a reboot. This means you can play around with this in real time without needing a test lab. Of course, you can always just continue and take the reboot if you choose or if you're running in unattended mode. Notice how easy this makes figuring out who's locking your files!
In addition, the Microsoft Installer technology also handles the existing PendingFileRenameOperations (PFRs) much better. You may recall that SQL Server 2000 setup would block if any files at all appeared on the registry key we discussed earlier. SQL Server 2005 setup is smart enough to block only if the files belong to SQL Server (meaning there could be a conflict), and even in those cases can often reach into the PFR key and directly update it without needing a reboot. This technology is included at some level in both MSI and update.exe, which are the current standard Microsoft installers that are used to perform all updates.
What are some of the known gotchas out there that cause reboots? MSXML3.dll is one of the biggest. This file is always locked by the Windows SVCHost service, so anything that includes this service will have to reboot. And it's in most MDAC stacks (mdac_typ.exe) as well. Fortunately, MDAC is "in-band" on Windows Server® 2003 and Windows XP SP2 and later. The SQL Server development team did some great work splitting off msxmlsql.dll so that the SQL Server updates won't cause a reboot, but you can expect to see this one occasionally for a while yet. There's nothing you can do here—you can't shut down SVCHost and still run updates.
Also, update.exe packages have a bit of a quirk around the uninstall process. If you uninstall one, it puts a lock (outside of the PFR keys, of course) on the installer's in-place files. That prevents other update.exe packages from being able to run until you reboot to clear that lock. All you can really do is be aware of it and plan to reboot after any hotfix uninstalls you may do.
Intermittency is another issue. Not all programs sit on a file 100 percent of the time. In fact, most of the usage of shared DLLs is point-in-time, meaning that other programs access them only as needed, depending on their own workload and code paths. That means you might test 10 times and see no locked files, then hit a locked file on the production deployment. There's no magic bullet, unfortunately. Just build your test environment as much like your production environment as you can, including load (a good goal anyway) and run enough tests to pop these permutations.
OS service packs and upgrades are going to reboot. And note that on downlevel versions (Windows 2000, Windows XP, and anything earlier), you haven't finished the installation until after the reboot—even if the pendingfilerenameoperations registry key is empty. There are exception package registration operations that may run to install code during that reboot. These aren't even part of the service pack setup, but are embedded commands in your current operating system that are triggered by the upgrade scenarios to allow certain pieces of code you are using to survive them.
Finally, the PFR key does not do file version checks, nor does it enforce order through serialization. This means that it kicks off the file replaces in the order they are listed. But the order they are applied to the hard disk has been proven to be the last to finish (not the first to be started). That makes sense, but it means that having two copies of the same file on the same path, each pointing to different versions, will produce a random result following the reboot. If you ever see this, you need to make sure it gets fixed. If you are unsure, simply run both installers again after rebooting. If you've got the right files, neither will do anything. If you got the wrong ones, then whichever package has the right one will now fix it. Frankly, that's both faster and safer than trying to work out what the desired result is.
Multiple Service Pack Installations
Q I have eight SQL Server 2000 clustered instances on a 4-node cluster and would like to install SP4 with only minimal reboots. Can I install SP4 on all 8 instances one after another and at the end reboot all the nodes?
A Once you've completely installed a single instance of SQL Server SP4—which may require a reboot—on a given Windows server (clustered or not), you should not experience additional reboots when installing SQL Server SP4 on any other SQL Server instances on that box. The reason for this is that all the shared files—tools, MDAC, MSXML, and so forth—are already the right version and locked or not do not need to be further updated. All the non-shared files are used only by the first instance, and it will be shut down by setup. It's worth noting that the most recent instance of SQL Server should be updated first if you have multiple instances of multiple versions.
Running as Network Service
Q Exactly what is the NT AUTHORITY\NETWORK SERVICE account and what is its main purpose? Also, what are the security implications of this account, especially if it is given sysadmin (sa) rights?
A The Network Service account has the same identity (machine identity) as System on the network, but it has reduced privileges on the local box. Thus if a service requires access to network resources, perhaps to resolve account names with the domain controller, it must run either as System, a domain account, or Network Service. The latter is generally the safest choice because it is scoped only to a local box and even there it has reduced privileges.
Making Network Service a sysadmin in SQL Server means that any other service running as Network Service on a local box has full control over the server. This happens, for example, when the SQL Server service account is configured to run as Network Service. Making a SQL Server service account a member of sysadmin is required for functionality of the server (for example, to allow loopback connections). While having Network Service run as admin on SQL Server is not a very good security practice, it may (or may not) be better than using a domain account for that purpose, and it is much more secure than having SQL Server run as LocalSystem.
If you don't want Network Service to have sysadmin privileges, the next choice to explore is using a domain account specifically dedicated to running SQL Server on one or more installations of Windows NT®.
Creating Multiple Databases
Q I am thinking of partitioning data into individual databases, maybe up to 250 databases online at once. I estimate that only 20 percent will be used for active queries at any given time. Are more databases better than fewer?
A The number of databases within a database server instance should be driven by business needs and administrative factors. There is very little overhead in creating multiple databases within a single server instance, but more databases mean more administrative overhead for maintenance tasks like backup and restore, mirroring, user accounts, and user roles. The additional administrative complexities may outweigh the benefits of additional databases.
As a general rule, all data related to an application should be kept in one database, which makes recovery from a failure easier. If an application's data is spread over several databases, then all databases must be recovered during a failover. This can delay the recovery, or even prevent a recovery if one of the databases can't be recovered.
That said, there are reasons to segregate data into multiple databases:
- A subset of data has different backup requirements from other data.
- Certain data needs a separate security context, that is, a different database owner.
- Sometimes data exists for historical purposes and needs to be kept in read-only mode.
Thanks to the following Microsoft IT pros for their technical expertise: Ramon Arjona, Frank De Waelle, Robert Djabarov, Herman Learmond-Criqui, Maxwell Myrick, Ruslan Ovechkin, Uttam Parui, Shashi Ramaka, Gary Roos, Gavin Sharpe, Vijay Sirohi, Jimmie Thompson, and Madhusudhanan Vadlamaani.
Edited by Nancy Michell
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited