Contents of a Run Book

Updated : November 12, 2002

Contents of a Run Book

A run book should contain all of the information you and your staff need to perform day-to-day operations and to respond to emergency situations. This information should include the following:

  • Resource information about the data center and its hardware and software

  • Process information, including step-by-step procedures for operational and emergency processes

The run book should contain all necessary information to enable a staff member to perform any process, from performing a backup to failing over to a remote site.

On This Page

Resource Information
Procedural Information

Resource Information

The run book should contain the following types of detailed resource information to help your staff perform routine operational tasks and respond quickly and efficiently to data center emergencies:

  • Contact information — Detailed information about each database administrator (DBA), the building facilities staff, utility companies, and all hardware and software vendors

  • Hardware components — Detailed information about hardware components of the data center

  • Software components — Detailed information about software components of the data center

Keeping this critical resource information current and readily available to your staff reduces downtime when disaster strikes.

Contact Information

Record detailed information regarding each individual or company that you or your staff may need to contact in an emergency. This detailed contact information should include the following:

  • Contact information for each DBA at the primary site, including his or her role in the operational and disaster recovery process

  • Contact information for the building facilities staff, the power company, the phone company, and other applicable utilities companies

  • Contact information for your remote site, if you have one, and for all DBAs at that site

  • Hardware, software, and service vendor support phone numbers, e-mail addresses, account numbers, and login and password information for related Web sites

  • Contact information for other server applications on the server, including developers, analysts, testers, and managers affected by a change to the application, related systems, or processes

In addition, record any additional contact information that might be useful in troubleshooting and repairing the data center, such as useful e-mail discussion lists and Web sites.

Hardware Components

Record detailed information regarding each hardware component in the data center, including the following:

  • Server hardware

    • Model and serial number

    • Brand and speed of the processor

    • Amount and configuration of memory

    • Version of the BIOS

    • Dates and version numbers of firmware

    • NIC cards, including their vendors and model numbers

    • SCSI host adapter or fiber channel cards, including their vendors and model numbers

  • Local storage hardware

    • Type, size, and number of drives, including cache if any

    • Logical disk configuration

    • RAID levels

    • Disk controller information (including write cache settings)

    • Dates and versions of firmware for drives and controllers

    • Special options used, such as allocation units

  • Disk arrays and storage area networks

    • Vendor and model

    • Type, size, and number of drives, including cache if any, and controller to which the disk is connected

    • Logical disk configuration

    • RAID level

    • Number of controllers and number of channels

    • Disk controller information (including write cache settings)

    • Dates and versions of firmware for drives and controllers

    • Special options used, such as allocation units

In addition, record all additional information about the data center hardware that might be useful in troubleshooting and repairing the data center. For example, record a map of the physical wiring of specific drives to specific array controllers.

Software Components

Record detailed information about each software component in the data center:

  • All software

    • Serial numbers and/or license keys

    • The network share location for all software installed on the server, including all service packs, hardware drivers, and hot fixes

    • The onsite and offsite location of all software CDs, including license keys and serial numbers

    • The location of the written documentation for all software

  • Windows 2000

    • Operating system version, with service pack level and hot fixes

    • Server name, IP address, and role in the domain

    • Customized settings, including terminal server and registry settings

    • Information on related systems, including contacts, configuration information, and documentation of data interfaces

    • Local administrator account name and password

  • MSCS

    • Cluster configuration, including all cluster IP addresses, cluster name, cluster nodes, and cluster resource groups

    • User accounts authorized to administer the cluster

  • Microsoft SQL Server

    • Installation information, including service pack levels, hot fixes, instance names, server collation, ports, pipes, configuration options, virtual IP name and address, database file locations, file groups, service logins and passwords, e-mail account, and enabled network protocols

    • Information about file shares used by the SQL Server and SQL Server Agent service accounts and the associated permissions on those shares

    • Database collations if different from the server collation

    • Server roles, database schemas, user accounts, permissions, database roles, custom error messages, and the location of scripts to recreate these objects

    • List of all automated SQL Server Agent jobs (specifically including all backup jobs), what they do, who is notified, their corresponding code for each job step, the time or times they run, and the location of scripts to recreate the jobs

    • List of all alerts, what they do, the associated error number or performance condition, who is notified, and the location of scripts to recreate the alerts

    • Linked server, remote server, replication, and log-shipping configuration information

    • Distributed database and distributed partition information, including information such as Data Dependent Routing Tables and distributed transaction marks

    • List and location of all DTS Packages, including associated login and password information

    • List, location, and purpose of all custom code that runs on the server, and the location of a backup copy of this code

    • Names and locations of client tools installed to connect to remote database connections (for example, to heterogeneous data sources), and necessary configuration and connection information

    • List of additional features in use and relevant configuration information, such as Extensible Markup Language (XML) support for Internet Information Services (IIS), Active Directory service support, and Data Source Names (DSNs)

  • Analysis Services

    • Data source and transfer information, including all associated jobs

    • Location and storage format of the Analysis Services repository

    • Analysis Services repository backup job information and storage location

    • Location of data files

    • Security architecture, including logins, database roles, and cube roles

In addition, record all additional information about the software that might be useful in troubleshooting and repairing the data center. For example, record the staff members who are most familiar with custom applications.

Procedural Information

Develop and document procedures for each operational and emergency task that you and your staff perform. Whenever possible, develop Transact-SQL scripts for each of these tasks and automate the execution of these scripts by using SQL Server jobs or DTS packages. The procedural information should include the detailed steps and scripts for performing the following tasks utilizing both SQL Server Enterprise Manager and Transact-SQL scripts:

Operational Tasks

The DBA staff performs many routine operational tasks. To avoid problems, your staff should perform these tasks by using the same procedures each time. Record step-by-step procedures for performing each of the following types of routine operational tasks:

  • Security tasks

    • Changing the domain user account and password used by SQL Server and SQL Server Agent

    • Creating new logins and database user accounts

    • Changing SQL Server user passwords

    • Performing standard and C2 security audits

    • Scripting login information

    • Scripting application roles and recording passwords

    • Scripting linked or remote servers

    • Restoring logins and database users to another SQL Server instance

  • System administration tasks

    • Starting and stopping the operating system

    • Starting and stopping SQL Server services

    • Changing SQL Server configuration settings

    • Setting database options

    • Applying SQL Server service packs

    • Changing the server name

    • Manually backing up a database

    • Manually backing up a transaction log

  • Monitoring tasks

    • Monitoring CPU usage

    • Monitoring disk activity

    • Monitoring memory usage

    • Viewing current locks

    • Viewing current activity

    • Viewing the last command batch for a specified connection

    • Viewing the data and log space information for a database

    • Viewing the oldest active transaction in the database

    • Viewing the procedure cache usage

    • Viewing general statistics about SQL Server activity and usage

    • Identifying and analyzing bottlenecks

  • Data collection tasks

    • Archiving system and application logs in the event viewer

    • Archiving SQL Server error logs and SQL Server Agent logs

    • Archiving SQL Server setup logs

    • Archiving the cluster log file

    • Archiving sqldiag.exe output

    • Capturing output from sysperfinfo and sysprocesses

    • Capturing output from MPS Report tool if available

  • Troubleshooting tasks

    • Testing TCP/IP sockets client connections

    • Testing named pipes connections

    • Troubleshooting deadlocks

    • Troubleshooting failover clustering

    • Troubleshooting replication

    • Troubleshooting log shipping

    • Troubleshooting MS DTC transactions

    • Troubleshooting orphan users

In addition to the foregoing, add step-by-step instructions for other tasks that you and your staff perform regularly.

Emergency Tasks

Record the appropriate response to each type of emergency that may affect the data center. Although the precise tasks vary depending upon the high availability solutions implemented, have a planned and tested response to each of the following types of emergencies:

  • Natural disasters

  • Power outages

  • Server failures

  • Hardware component failures

  • User database corruption

  • System database corruption

  • Application failures

  • Network failures

  • Web server or other necessary server failures

Depending upon the high availability solutions implemented for the data center, the detailed steps will include MSCS failover and failback steps, log-shipping role change steps, transactional replication role change steps, and database restoration steps. These procedures should document the process of determining when to initiate a failover or a role change and how affected users are notified. These procedures must include steps to verify the system's state before bringing a restored system or database online. They should also include escalation steps in case the first attempt to restore availability fails.