Be Prepared: A Guide to SharePoint Disaster Prevention and Recovery
At a Glance:
- The relationship between WSS and SPS
- Available disaster recovery tools
- Backup tools to prepare for the worst
- Recovery from various failure scenarios
Windows Server 2003
SharePoint Portal Server
Windows SharePoint Services
Microsoft Windows SharePoint Services (WSS) and Microsoft Office SharePoint Portal Server (SPS) 2003 are rapidly gaining popularity with companies of all sizes. WSS is installed by default with Small Business Server 2003, and it is available as
a download for Windows Server™ 2003. SharePoint® is being used to host vast amounts of shared resources. With the widespread use of these technologies, it is important to have a good knowledge of the disaster recovery procedures in case of unexpected events.
The SharePoint brand began with SharePoint Team Services 1.0 and SharePoint Portal Server 2001. These products used different storage technologies, requiring separate disaster recovery methods. SharePoint Team Services relied on Microsoft® SQL Server™ 2000 Desktop Engine (MSDE), while SharePoint Portal Server 2001 used a customized version of the Microsoft Exchange Server Jet database engine. Although these storage technologies were adequate for the first version of each product, they did not allow SharePoint to scale out to support distributed environments.
Microsoft shifted gears for the second wave of SharePoint products, opting for SQL Server database technologies across the board. This decision allowed SharePoint products to scale from small servers using the updated Microsoft SQL Server 2000 Desktop Engine (Windows) (WMSDE), to large server farms using back-end SQL Server clusters. Additionally, the unified storage technology allowed customers to focus their disaster recovery efforts around a single product. However, even with this unified storage in place, SharePoint disaster prevention and recovery is still a complicated topic. In this article, I explore the tools and processes necessary to recover from the most common problems you may encounter in your SharePoint environment.
WSS vs. SPS
Before I go any further, it is important to understand the relationship between WSS and SPS. Although WSS can operate as a standalone workgroup collaboration server, it’s also one of the building blocks of SPS. In fact, one of the first steps of the SPS installation is to install WSS. If you are an outer-space buff like me, it might help to think of WSS as an individual module of the International Space Station, while SPS is the fully assembled station. While you can easily survive in just one module, you will have a much richer experience when the whole station is put together. Therefore, even though the focus of this article is on SPS, you will gain a better understanding of SharePoint disaster recovery no matter what type of environment you support.
Setting Up the Scenario
The example environment for this article includes two Windows Server 2003 member servers in an Active Directory® domain. The first server, which I called TN-SPS, is configured as a front-end server running SPS 2003. TN-SPS is responsible for all SharePoint functions, including Web publishing, search and index hosting, and job processing.
The second server, called TN-SQL, is running SQL Server and holds all of the SharePoint databases for TN-SPS. Such a configuration is referred to as a small farm. A typical small farm can store up to 100,000 documents, host up to 10,000 team or personal sites, and process approximately 37 requests per second. Figure 1 shows a simple diagram of this environment.
Figure 1 Sample SPS Topology
Disaster Recovery Tools
Microsoft provides several tools that allow administrators to recover critical information, ranging from a single team site to an entire SPS server. Conspicuously absent in this collection is a simple recycle bin solution for easy restoration of deleted files. Fortunately, third-party solutions and MSDN® code samples are available to address this shortcoming. See the sidebar "Other Recovery Options" for more information on these solutions.
SharePoint Portal Server Data Backup and Restore tool The graphical SPS Data Backup and Restore tool is responsible for recovery operations at the portal level (see Figure 2). Installed on the SPS server, this tool allows administrators to back up portal databases to a network location or a shared folder on the machine running SPS. Additionally, the Backup and Restore tool can be called from the command line (spsbackup.exe) and included in a script for convenient scheduling.
Figure 2 SharePoint Portal Server Data Backup and Restore Tool
Before running this tool for the first time, you must install the SQL Server Client Tools, which are included on the SQL Server product CD. This is necessary because the Backup and Restore tool essentially performs SQL database backups across the network. However, using this tool rather than traditional SQL Server backups enables you to retain the SPS index. If you have a large index of searchable content, you’ll appreciate the ability to protect this valuable asset.
A shortcoming of the Backup and Restore tool is its inability to perform lossless restores. If you restore a portal backup using this tool, all existing portal data will be overwritten by the restore job. This makes it unsuitable for all but the most critical SPS failures. You can work around this limitation by restoring a portal to a separate, stand-by SPS server, from which you then extract important data. Unfortunately, this time-consuming operation requires an extra investment in hardware.
Stsadm.exe Stsadm.exe is a command-line administration tool that is installed on all servers running WSS and SPS. This tool performs numerous operations, but I’ll focus on its site-specific backup and restore functionality. When used with the –o backup switch, stsadm.exe can back up one or more sites, including the unique MySite for each user.
Here is an example stsadm.exe backup job for a site called Alpha on TN-SPS:This simple command backs up the Alpha site to a single file called alpha.dat on the local system drive. This is a full-fidelity backup, meaning all security and metadata information is included. When it comes time to restore this data, simply replace the –o backup switch with –o restore. Administrator privileges (both within SharePoint Central Administration and on the server itself) are required for both backup and restore operations using stsadm.exe. In addition, note that stsadm.exe must be run locally on the server in question.
Stsadm.exe –o backup –url http://tn-sps/ sites/alpha -filename c:\stsadm_bak\alpha.dat -overwrite
As powerful as it might seem, stsadm.exe is not a replacement for the SPS Backup and Restore tool. At times, stsadm.exe may increase the processing burden on the SQL Server back-end infrastructure. In fact, Microsoft recommends against running stsadm.exe backups during times when users are accessing SPS. Nevertheless, stsadm.exe is still a useful tool for recovering from accidental site or document deletions. These are the two most common types of SPS data loss scenarios, and without stsadm.exe or third-party software, they can cause you a lot of pain. You can find more information on the supported scenarios for using stsadm.exe at Supported scenarios for using the Stsadm.exe command-line tool to back up and to restore Windows SharePoint Services Web sites and personal sites in SharePoint Portal Server 2003.
Smigrate.exe Although originally designed to migrate sites from one server to another, smigrate.exe can also be useful for site backup and recovery. One key difference between this tool and stsadm.exe is its lack of support for full-fidelity backups (meaning permissions will be lost after a restore). Like stsadm.exe, this tool applies to both WSS and SPS installations, but it only works primarily at the Web level. Smigrate will not back up site collections.
Here is an example smigrate.exe backup job for a site called Bravo on TN-SPS:Surprisingly, you can also use Microsoft Office FrontPage® 2003 to conduct a back-up operation nearly identical to that of smigrate.exe. This allows site owners and users with site admin privileges to back up their own content using a familiar Office graphical interface.
Smigrate.exe –w http://tn-sps/sites/bravo -f c:\smigrate_bak\bravo.fwp
SharePoint Configuration Analyzer SharePoint Configuration Analyzer (shown in Figure 3) is not a recovery tool itself. However, it can provide valuable information to help you map out a recovery strategy. While not installed by default, you can download SharePoint Configuration Analyzer from the Microsoft Web site (see SharePoint Configuration Analyzer v1.0 for Windows SharePoint Services).
Figure 3 SharePoint Configuration Analyzer
SharePoint Configuration Analyzer works in both WSS and SPS environments. This tool is particularly helpful if you have inherited a SharePoint implementation from someone else, and you have no documentation describing how it was built.
General Back-Up Tools
Even after you create a robust SharePoint backup plan using the tools I’ve described so far, your SPS environment is still at risk for downtime and data loss. You need to implement tools such as Ntbackup, SQL Enterprise Manager, and IISBack.vbs to make sure your SPS environment is recoverable from almost any situation.
Ntbackup Although Microsoft made a valiant attempt to store all SPS data in SQL Server, there are still some files that reside on the local file system. Examples include web.config files in the \InetPub directory, Web Part assemblies in %systemroot%\assembly, and custom templates in various directories under C:\Program Files. As you can imagine, it is important to run local file system backups on a regular basis as part of your SPS disaster recovery plan. You can use any dependable backup tool you prefer, such as Ntbackup. Simply run Ntbackup.exe to create and schedule a full backup job for the local file system. If your SPS server lacks a tape drive, then offload the resulting .bkf file to a server that does. Regardless of how you go about it, make sure to protect the local file system on all SPS servers. For more detailed information on using Ntbackup, see Jay Shaw’s article in the Spring 2005 issue of TechNet Magazine.
SQL Enterprise Manager Given the large amount of data stored in SQL Server databases, it is important to include Enterprise Manager backups in your SPS disaster recovery strategy. SPS stores data in at least four databases, as shown in Figure 4.
|Portalname_PROF||Portal profile database, which contains user profile information and audiences|
|Portalname_SERV||Information on portal services, such as search and alerts|
|Portalname_SITE||Site content database (may be more than one depending on size of SPS farm)|
|SPS01_Config_db||Configuration database (one per SPS farm)|
|SSO||Optional single-sign-on database used in large SPS farms|
Detailed instructions on how to perform SQL Server database backups are provided in SQL Books Online, which is installed by default with SQL Server. You will likely only need to recover the SQL Server databases in the event of a serious server problem, such as a hardware failure or file system corruption. In less extreme scenarios, you should use the Backup and Restore application on the SharePoint Portal Server to initiate portal restores.
IISBack.vbs The final piece of the SPS recovery puzzle is a regular backup of the IIS metabase. The IIS metabase contains critical information about virtual servers that is not covered by any of the SharePoint backup tools. Without a functional IIS metabase, all portal content is inaccessible. Although IIS 6.0 provides automatic metabase backups, it is likely that once you discover the metabase as the source of your problems, the automatic backups will be too recent to be useful. To avoid this problem, you should maintain an archive by scheduling daily metabase backups using the IISBack.vbs script, which is included on all editions of Windows Server 2003. Here is a sample IIS metabase backup job using IISBack.vbs:
C:\windows\system32\cscript IISBack.vbs /backup /b TNSPSBackup
Possible Disaster Recovery Scenarios
Now that I’ve discussed the tools necessary to protect a SharePoint environment, let’s take a closer look at some potential disaster recovery scenarios. Although there are many available third-party tools you can use for disaster recovery, I will focus on Microsoft utilities in the following scenarios.
Individual document recovery Recovering an accidentally deleted document is the most common recovery situation you will face. If the deletion is reported quickly, you can simply restore the affected site in-place using stsadm.exe. However, if much time has passed since the actual deletion, using stsadm.exe in this manner might result in lost data. In this case, you will want to restore the affected site to an alternate portal. Such an operation is not as quick as an in-place restore, but it works well enough in most circumstances.
Here are the steps necessary to recover a document that was accidentally deleted from a sample site named Delta:
- Use IIS Manager to create a new Web site on TN-SPS. If you do not have multiple IP addresses available for this server, simply configure the Web site to use an alternate port (in this case, tcp port 4321).
- Use the SharePoint Central Administration Web console to create a portal on the new Web site. For this example, I’ll call this portal DRPortal.
- Use stsadm.exe to create a new, blank site on DRPortal. The following command will accomplish this task with very little effort:
Stsadm.exe -o createsite –url http://tn-sps:4321/sites/delta -ownerlogin contoso\administrator -owneremail email@example.com
- Now use stsadmin.exe to restore a backup of the Delta site to the newly created alternate site:
Stsadm.exe –o restore –url http://tn-sps:4321/sites/delta -filename c:\stsadm_bak\delta.dat
- Log on to the newly restored site and retrieve the deleted document.
- Delete the new Delta site from DRPortal once you are finished to recover disk space. It might be wise to leave the new DRPortal around a while for future recovery operations.
You cannot restore the Delta site to an alternate location on the same portal (for example, tn-sps/sites/deltarestore). If you try this, the restore operation will fail since only one copy of a site is allowed per content database. By creating a separate DRPortal, I also created a new content database and worked around this issue.
IIS metabase recovery The second recovery scenario involves the IIS metabase. In this scenario, an organization hires a security consultant to conduct internal vulnerability assessments. During his security sweep, he finds IIS running on TN-SPS. Not realizing that IIS is required for SharePoint, he uninstalls the service by using Add/Remove Programs. About 30 seconds later the help desk starts getting calls from users complaining of errors accessing SharePoint. After discussing the issue with the security consultant, I immediately reinstall IIS on TN-SPS. However, this is not enough to restore SharePoint to full functionality. Users are now seeing the dreaded "Under Construction" page when they attempt to access SharePoint.
I need to restore the IIS metabase to replace the SPS virtual servers and extensions. Fortunately for me (and the security consultant), I make daily IIS metabase backups on TN-SPS. I simply issue the following command on TN-SPS, and the portal is restored to working order:
C:\windows\system32\cscript IISBack.vbs /restore /b TNSPSBackup /v HIGHEST_VERSION
Recovering from hardware failure Separating the SQL Server databases from the front-end SPS services provides many benefits, the most important being disaster resiliency. Let’s look at hardware failure scenarios for both servers in the example lab. Unfortunately, I don’t have enough space here to offer step-by-step instructions for each hardware failure scenario, but the information I can offer will give you a high-level overview of the recovery process.
Suppose my front-end SPS server (TN-SPS) is rendered unbootable by a hardware failure. However, my database server (TN-SQL) is running just fine. I simply build a replacement server, install SPS, and reconnect to the existing databases on TN-SQL. There is one important caveat in this scenario, though. The configuration database needs to be deleted from TN-SQL before I try to reconnect the new SPS server. Don’t worry—a new database will be created as part of the new SPS setup. When reconfiguring the replacement server, select the setup option to restore a portal instead of creating a new one. Make sure to type the database names as they exist on TN-SQL or the restore operation will fail. Once completed, TN-SPS will be up and running.
Here’s another scenario I hope never happens. My server that hosts SQL Server (TN-SQL) is destroyed by a water leak. A replacement server is quickly built to take its place. The first recovery step is a full restore of the file system and system state (remember to build the replacement server with the same name, and do not join the domain). Next, I restore all SQL Server databases related to SPS using Enterprise Manager. Now that TN-SQL is back online, I can reboot TN-SPS to reattach to the newly restored databases. Once TN-SPS comes back up, SharePoint will appear exactly as it was when I last ran a SQL Server backup.
SharePoint Portal Server and Windows SharePoint Services are becoming increasingly popular tools for business collaboration. As reliance on these platforms for critical data storage increases, so does the importance of having solid backup and recovery solutions in place. With this article, I’ve covered some of the key tools necessary for your SharePoint administrator toolkit. For more information, take a look at the backup and recovery section of the SharePoint Products Resource Kit located at Disaster Recovery in SharePoint Products and Technologies.
Other Recovery Options
Several solutions exist to provide granular item backup and restore capabilities in WSS and Microsoft Office SPS 2003. Here are a few of the more popular ones:
AvePoint: DocAve 3.1 Item Level Backup (http://www.avepoint.com) DocAve 3.1 Item Level Backup provides granular item recovery for either WSS or SPS 2003 (see Figure A. DocAve 3.1 is a graphical application with a robust, yet somewhat cumbersome interface. However, it performed admirably in all my testing. DocAve 3.1 is licensed based on the number of sites.
Figure A DocAve 3.1 Item Level Backup
CommVault: Galaxy Backup and Recovery for SharePoint (http://www.commvault.com) More than just a granular item recovery tool, Galaxy Backup and Recovery provides a suite of backup, recovery, and archiving tools for Microsoft environments (see Figure B) In addition to providing full server and database recovery options, Galaxy lets administrators to grab individual files from a backup job and restore them to their original location, or to an alternate one. My favorite feature is the ability to search on a particular word or phrase to help locate the accidentally deleted document. CommVault pricing varies depending on the number of servers and applications to be protected.
Figure B CommVault Galazy Backup and Recovery
Recycle Bin Code and Guidance If you are fortunate enough to have developers working for your organization, or if you are a closet developer yourself, you’ll find valuable sample code and guidance for creating your own recycle bin solution in the February 2005 issue of MSDN Magazine (see SharePoint: Add a Recycle Bin to Windows SharePoint Services for Easy Document Recovery). If you enjoy working with .NET event handler classes and WSS object models, this should get you started on a custom-built solution.
Jeff Centimano is a Windows Server MVP and Principal Consultant for a Microsoft Gold Partner. Jeff maintains an IT-focused blog at cgenius.blogspot.com and can be reached at firstname.lastname@example.org.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.