Change, Configuration, Release Management
This chapter introduces the processes and procedures that allow a database administrator (DBA) to manage changes to any part of the Microsoft SQL Server 2000 environment, including database code, objects, system processes, server configuration, or hardware. This chapter also covers configuration management, which is a crucial part of manageable change, and provides a run book template.
On This Page
To design a supportable system for managing change, configuration, and releases, you will need to identify people who fill the following roles:
Database system engineer (full life-cycle DBA), or one development DBA and one production DBA
Quality control team
You will need access to all resources documenting change processes, and access to information regarding the database server, from the database application to the offsite backups.
Before you can manage change, you must first build a test lab and develop testing processes and procedures.
Technology You Should Be Familiar With
You should be familiar with Microsoft Visual SourceSafe version control software, or a similar source code tool.
Processes and Procedures You Should Be Familiar With
You should be familiar with how to generate scripts, how to observe server configuration, and how to write Transact SQL (T-SQL) code.
Change management is the practice of administering changes with the help of tested methods and techniques in order to avoid new errors and minimize the impact, if any, on service level. Change differs both in its impact and by type.
The environment for handling change includes a development environment and a separate staging environment that is either identical to or closely simulates the production environment. After a change has been planned and developed, it should be tested for quality assurance in a staging environment, and then should be implemented in the production environment. The policies and processes that are enforced determine the procedures that must be used as a change goes through these phases.
The process for managing and handling change may vary depending on the impact and type of change, but it begins by controlling the development and staging environments. The DBA must have input in the design and planning of system components, including the data access code, database objects, and the server architecture, and its configuration.
The DBA must also be involved in evaluating the proposed changes during the quality assurance testing stage. The final stage in the change process is managing the implementation process, so that only known and tested factors are introduced to the system. A post-implementation review, which includes the DBA group, allows those involved in the change process to learn from the process before it begins again.
This is an area where Microsoft Solutions Framework overlaps with Microsoft Operations Framework from the DBA's perspective. The change can be application driven, or it can be operations driven. So, the DBA may begin in the design phase of the MSF cycle, and end up going through MSF and through part of MOF to implement the change. This same cyclical approach also applies to the way that a DBA maintains the database. By constantly evaluating the value of potential changes to either the database servers or the processes by which they are supported, the DBA is essentially continuously moving full circle through the Operations Framework.
Lack of change management is a major cause of failures in the data center. Therefore, the priority must be to document, test, and manage the deployment of all database changes. These include changes that are made to a stored procedure or function code, the physical database structures, the data, the storage components of the database, or the server itself.
Managing the Environment
This section describes two approaches to managing database servers:
The final product produced by the developers is distributed as a packaged product.
Custom-built applications are built and enhanced within a company for the purpose of implementation into that company's data center.
The crucial difference between these two approaches is the type of testing required and the method of deployment used in each.
Managing Database Change for Packaged Products
Managing database change for packaged products is the activity leading to change that results in a product that is distributed to users.
Development and Unit Testing
If the database is developed and altered by a DBA you can manage change in one of two ways:
Run one central development server where each developer works in his or her own schema but does not alter dbo objects.
Run a server in a unit test environment, where developers test their own code in isolation prior to handing it off to the staging environment.
With the second method, changes are implemented through scripts provided by the developers, who are each running Microsoft SQL Server 2000 Developer Edition. If small datasets are not adequate for the developer's work, larger tables can be located in a special database created on the unit-test server. The developer can access this database through a linked server and the use of a local view with the same name as the table would have if it were local.
Unit testing on a central server in the development phase of the project allows you to manage the milestones of your project, and gives you a place to work out any problems before moving to the staging environment, where all installations are run by fully automated scripts or install/uninstall programs. Any changes made to these scripts or programs must be checked into your version control source code.
Quality Assurance, Integration Testing and Staging
When you are distributing packaged products, you must have a staging environment that allows you to test the installation and supported configurations for your product. A separate integration test environment may also be required. The integration test environment allows you to formally test how well the changes you have made will work with the rest of the system. It is a good practice to keep configuration information on each test server/instance, so that you can completely document what was configured during all stages of testing.
Managing Database Change for Custom-Built Applications
When the final goal is implementation of the change within the company, adopting change management practices requires that the DBA exercise control over the database servers used in development/testing and staging. There must be policies in place to prevent the development and production implementation of applications that have not been formally reviewed and tested for quality. The reason for this is to ensure that the operations staff can appropriately support all databases essential to the business. Another key reason is to ensure that you do not end up with production systems running on development servers.
Development and Unit Testing
The DBA creates and maintains a current set of implementation and rollback scripts, once development has finished, and also maintains a full set of versioned data definition language (DDL)/data manipulation language (DML) in a source control tool. The DBA should create all physical objects in the development environment that are targeted for production, and should review all DML that will be used. Scripts for creating the objects are controlled and versioned by the DBA when unit testing begins.
To ensure that those scripts are thoroughly tested and perform as expected, scripting of changes must begin during development/testing, and continue through staging and deployment.
Without adequate testing, you are not assured of achieving success when you alter the production environment. Appropriate test equipment is essential. The more of your entire system that you can duplicate inside a lab, the better off you will be. At a minimum, you should have servers of a configuration that is comparable to your production environment. At best, you should aim to reproduce your entire data center on a smaller scale inside an enterprise lab.
Levels and Types of Change
MOF defines both levels of change and types of change.
Levels of Change
The levels of change are categorized as major, significant, minor, or standard. The change categorization is determined by the impact to the organization and the resources required to plan, develop, and implement the change. Table 2.1 describes the levels of change in MOF.
Table 2.1 MOF Levels of Change
· Far-reaching or profound impact to the IT environment, requiring a great deal of planning and building.
· Requires many resources to plan, build, and implement.
· Has low impact and requires few resources.
· Follows established and documented approach.
Types of Change
In addition to differing impacts of a change, there are four types of change that are addressed. Table 2.2 categorizes these types of change.
Table 2.2 MOF Types of Change
Any type of replacement of hardware/firmware or change in software version or configuration.
Changes to the database server, schema, or code to enhance an application or optimize the database.
Changes made when something is found to be less than optimal.
Changes made either as a result of a failure of the change management process or operator error, or without the knowledge (and consent) of the DBA.
The Process of Change
This section covers the process of change, from the development environment through everything that must be done in the production environment to properly manage change.
If your company maintains or makes changes to a database application, careful management of the development process is a major factor in ensuring successful deployment of changes to databases and the database environment. The highest priorities in managing change are control of the staging environment and adequate testing. Working with formal staging or quality assurance labs and procedures requires careful planning and preparation, even during development.
A qualified DBA should actively participate in the entire project life cycle, from the proposal through development, testing, staging, deployment, and evaluation. Ideally, this person should also be involved with operating and supporting the system, so that the lessons learned in production can be applied to the project.
By managing the changes that are requested to the database objects, and by carefully reviewing the code created by the developers, the DBA can ensure that the changes developed will work well in the production environment. The DBA should develop or help develop implementation scripts, rollback scripts, and testing scripts, and should create or maintain related database documentation.
For business reasons or technical reasons, the DBA may occasionally need to postpone or reject changes that have been requested. In either case, the business impact of the decision should be evaluated and documented.
Make sure that your change process is modular. You should be able to restart the entire process, stop at any point and then continue, and identify exactly what changes have been made and what have not. For this reason, you may decide to create individual scripts for each logical group of changes and step through them.
A good way to keep track of what you have done is to put auditing code in all of your implementation scripts, recording success or failure in a table. To make this process quicker, you can create a tool that executes the scripts in sequence (for example, you can create a table of scripts to run, and loop through them in a script that uses xp_cmdshells to run them through the osql utility). Or, you can make one large implementation script and include code that ends the implementation if certain errors occur. However, such a script is difficult to restart, unless you also insert code for the restart. If you choose this approach, think about the amount of auditing code you would need to include and then plan what you would need to do to find the error.
Deployment Disaster Recovery
When making changes (even tested changes), always have a contingency plan. Consider what would happen if something went wrong in the middle of the script sequence, or if the database scripts worked but the application team encountered errors and had to remove the changes. Always plan for a minimum of two possibilities: complete removal of all changes (which could require rollback scripts, or a restore, if the application is not designed to support multiple versions running in the same database), or an alteration of your plan due to predictable or unforeseen circumstances. Think through as many scenarios as you can, so you are sure you know what to do. Then, make notes in your scripts so that someone else can make changes in case you are unexpectedly unavailable during the actual implementation.
For a very large implementation, you might consider using a transaction log mark to note when specific changes are made. These marks are stored, labeled, and date stamped in msdb.
Reconfigurations and Upgrades
When you are making configuration and upgrade changes to your database environment, there are a number of tasks you should perform related to planning, user communication and coordination tasks, disaster recovery, testing, application and vendor issues, and so forth. The following list provides:
Every time you implement a new version or a Service Pack, carefully examine the changes documented in the file. What implication do these updates have for your system (code, database, SQL Server, and hardware)? For example, if you are running multiple instances of SQL Server on one server, and you need to apply a Service Pack to the operating system (or one that affects Microsoft Distributed Transaction Coordinator (MSDTC) or Microsoft Search), it will affect all the instances running on that server.
Note any differences in the data files sizes for the test server and the production server. Using this information, you should to be able to extrapolate the length of time the change will take in your production environment.
Plan how you will communicate any downtime to the users, and how you will communicate the change to your internal help desk.
Outline a disaster recovery plan in case a failure occurs during the installation of the Service Pack or another upgrade. Before you make any installations, review and verify your failover and failback plans and your disaster recovery plans.
Install the Service Pack on a test server to familiarize yourself with the process. Make notes on the steps that are listed. How much time passed between taking the server offline and bringing it back online?
Test your in-house applications with the new Service Pack. If the readme.txt file clearly indicates changes to Transact SQL (T-SQL), be sure to check your system for the items mentioned. Also talk to the developers and any owners of mission-critical user applications to make sure that the differences will not cause any unexpected problems.
If you are running third-party applications, and you expect to receive continued support from the vendors, contact them to find out whether they support their application running on the Service Pack you are planning to install.
Determine how will you coordinate (or delay) upgrades between different servers supporting the same system.
There are special considerations for failover clustering and log-shipped standby servers, which are covered in Chapter 4, "System Administration," and on TechNet. Because replication is not recommended for a standby server, the only consideration when using replication is to check the readme.txt file for any replication related notes (and test the replication in your lab first).
Other changes that require careful planning are:
Adding disks to the storage subsystem.
Adding memory, CPU, cards, or other hardware.
Any change to the SQL Server configuration (some of these changes require a restart of the SQL Server services).
Management of the Development Environment
It simplifies the work of the production DBA if the development environment is separate from the production environment, and does not give each developer unlimited rights to the development database environment. A simple way to do this is to create a database for the developers to use, but rather than adding them to the db_owner role, explicitly grant them permissions to write stored procedures, functions, and, at your discretion, views. This way, each developer owns his/her own schema, and the dbo objects remain intact, serving as a unit test environment.
The changes developers make to their own copy of an object will not affect everyone else on the development system until they are finished with it. They will still be able to share their work by granting permissions on the objects they are finished with.
This layer of security isolates and protects developers from one another. It also provides the opportunity to review and optimize code at an early stage.
During the code review process, take the time to make sure that adequate comments are provided in the database code. At a minimum, you want to have a sample syntax line, a statement of purpose, a notation on who wrote the code and for what system, and a list of modifications and associated Service Request numbers. The sample syntax line, which should be valid for testing using Query Analyzer, is the most useful comment. It allows you to select that line and run the code, so you can check the execution plan, resources consumed, and so forth.
An additional benefit of having a separate development environment is that it creates a natural unit test environment in the dbo objects of the development database. It also lets you create and verify your implementation scripts, by implementing them to the dbo. Be sure to make notes at this time. For example, which server/database is the target of the change? Will any changes be required at implementation time, such as pointing code to the production version of a linked server? When you are finished, the developer can then run code in this environment. All dbo objects are part of the unit test environment, although the data is not unless you protect it.
When development is ready to test, completely script all database changes and deploy these on the test server. A best practice is to create a rollback script to remove all changes, test the script, and then reinstall your change scripts. Doing so provides you with two complete tests of your scripts and one test of your rollback plan. For the production implementation, use the same scripts again, and then document the changes that were made.
Important: If for some reason you must deviate from these scripts in production, be sure to alter the scripts to reflect the change and enter a note in the script regarding what was altered, why, and whether you noticed any issues afterward.
Emergency quick fixes follow the same basic method as other changes, except that sometimes the issue is too urgent to undergo strenuous testing. If you are facing this situation, be sure to do a risk analysis, at least informally, before proceeding. At no time should you introduce a high-risk change to an unstable environment without careful testing.
The worst-case situation is when, during a production crisis, people do not keep track of the changes they make to the system or do not give adequate time for the most recent change to take effect. The situation can rapidly escalate as more people become involved in the attempt to solve the problem. All server administrators should be highly trained in resolving critical production issues. When a crisis occurs, priority should be given to the established process, which provides structure and control. Only highly experienced administrators should be actively involved with this type of change; people on site who are not authorized for crisis support should be required to notify a senior DBA that they need assistance. It is worthwhile, however, to encourage less experienced team members to observe the activity, because these situations provide good training, increasing the skill level of the team.
A good practice during a production crisis is to try to make only one change at a time and record observations before and after the change. However, crisis situations sometimes require drastic action. If you do not have a failover server prepared, you may find yourself making multiple untested changes based on previous experience and expertise.
Untested, nonstandard production changes made to avert a crisis should only be handled by highly experienced DBAs, or under the guidance of Microsoft Product Support Services (PSS).
If a crisis occurs, be sure to immediately document all the changes that you made. Document the urgency of the situation, what led you to your decisions, and which change(s) you believe were most effective.
When the environment is stable again, document any remaining issues and enter the event into the incident management system (see the "MOF Incident Management" white paper for more information). It is important that you prevent any further change from being made to the affected server until the configuration is verified against previous configuration documentation.
If several changes are introduced simultaneously, you may be left wondering which one solved the problem. In addition, one of the changes may introduce a different problem. However, the removal of unneeded changes should be planned through the usual processes. If you have designed a system that has a standby server available, and you fail over to the standby during the crisis, leave the original server in its broken state so that you can further analyze the issue, and enter notes in the incident and problem management database for future reference in similar cases.
By analyzing the impact the changes or the implementation of the changes may have on the other parts of the system, you can plan and prepare more effectively. Your analysis should also address any consequences for your service level agreement (See Chapter 8, "Service Management"), taking into account the risks of such an error or change implementation and the potential severity of the consequences. Analyzing the effects of system changes also prepares you to be involved in the planning of upcoming projects. You can use the planning information to analyze the impact of suggested changes, determine what you should do to facilitate those changes, and predict how they will affect the existing environment.
The quality of the test results you obtain depends on how well the test scripts represent the production environment. Creating good test scripts is an exacting process because the scripts must adequately represent your production load during normal and peak usage. For information about creating test scripts, go to the MSDN SQL Server home page at http://msdn2.microsoft.com/en-us/sqlserver/default.aspx.
Test any change that does not fall into the standard change category (see "Levels of Change" earlier in this chapter) on a copy of the production environment by rerunning your test scripts while performing baseline monitoring. Compare the results of the baseline before and after to see what effect the change should have. Be sure to use the same (or equivalent) script for each test to obtain comparable results. If any optimization or preventive maintenance changes are planned, they can be introduced into the staging environment at this point after the application changes have been proven acceptable.
To create a baseline for testing, develop your test script, and then run it on the staging server while you monitor the server in exactly the same way. Unless the application you are testing is new, you can create a trace to replay on the staging server. If your new implementation is such that you need to change the SQL statements before testing, the trace will serve as a starting place to give you an idea what the ratio of query types is.
Testing takes a considerable amount of analysis and can be time-consuming. You can use the tools available on MSDN (http://msdn2.microsoft.com/en-us/sqlserver/default.aspx) to allow configuration of database test scripts, or you can adapt a good profiler trace instead.
Tip Create the trace using the replay template.
If your system has individual roles or login accounts rather than a single application login account, a good practice is to trace individual users to isolate their activity for more thorough testing. You can also use a third-party tool for testing, rather than using a T-SQL-based process.
When the test script is complete, run it on a staging server that currently imitates production while you monitor the system for at least 30 minutes using System Monitor and Profiler/Trace. Then, install the changes and run the same test script (or equivalent, if the system is greatly changed) with the same monitoring setup. Compare the output to gain an understanding of areas that could be improved in the new release.
By monitoring a system with the same counters during the performance of the original production system test script, and then monitoring the same system after a change is made, you can judge whether the changes you are examining pose a detectable threat to the production environment. This is the real value of having a baseline. It provides you with facts regarding previous system performance, which you can use to judge the effects of changes over time.
Before you introduce change into the production environment, especially a priority problem fix rather than a standard reconfiguration or enhancement, be sure to notify all involved parties that changes will be made: the Network Administrator, the help desk, the user community (perhaps through a small group of user-owner contacts), and your operational change control group. You may have others on the list depending on the characteristics of the system you are changing.
After the change is complete, you should communicate the status of your work to the same people.
Alternate System Access During Changes
Although minor changes to the system can be made without taking down the system, deploying a major change without incurring any outage takes planning, and can involve the cooperation of more than one team.
If for any reason you cannot keep the main system online during the change, you may instead fail over manually to a standby system. If, however, you are making extensive changes to the database schema, you cannot allow users to continue entering or changing data on the standby system unless you have developed and tested a plan for capturing those changes and importing them into the new structure.
A common technique is to set a start value for the identity column before switching over and then reset it before copying back, thus creating a range used only on the standby server. If you want to use this method, be aware that if users are online and the copy process takes too long, the users may believe their data was not saved properly and may reenter it, causing duplicate items.
You must be prepared to handle unexpected, unmanaged changes made by system administrators, DBAs, or other IT staff. Fortunately, this preparation does not require any more effort than the associated processes of configuration management and change management.
This process involves determining exactly what unmanaged change or changes were made. If you have already accurately documented your original system configuration, and you have a complete and current version of your database code, you can compare these two sets of information for differences. (For more information about system documentation, see "Configuration Management" later in this chapter.)
This task can be time consuming and monotonous, so it is best to automate it somewhat. For example, you can compare the before and after copies of the files in question. Or, you can write a script that verifies that the install is as it should be (which would suggest that you have saved your preferred configuration to a table for easy comparison).
Although IT staff introduces some unexpected changes, another potential source of unmanaged change is the users themselves. If you are dealing with a new system, make sure that you understand what type of access the users have to the system, so that you know what types of issues may arise. Some common examples are those where an application requires that the users be a db_owner, or where the users have direct access to the tables in the database (in which case they might decide to edit information directly). Ideally, all users should use Windows Authentication, and permissions should then be mapped to application roles. For the sake of security, and to add a layer of abstraction, users and roles should be given access to the underlying data only through views and stored procedures.
It is important that you document the changes made to your system. You should save all e-mail related to the project, especially any showing discarded design ideas that may one day be used if the current design does not work out as planned.
If you have been thorough in creating your change scripts, you can use them as the basis of your documentation. Label the scripts clearly with incident, system, or database information and store them in a central location. Changes that you made without using scripts are more difficult to document, but you should do so for future reference.
Change and Release Management
One of the keys to managing change is using scripts and having procedures in place to keep track of versions.
The DBA should create, test, and organize implementation scripts before they enter the staging environment.
Script Based Installations
A script-based installation is the method that best meets the change management objectives. Scripts allow for repeated, controlled, and highly automated installations that can be easily rolled back at any stage. Some of the advantages of script-based installations are:
Clean install. Scripts can be used to create a new database. A script-based installation can be run with little or no awareness of the environment into which it is being deployed.
Clear install. Scripts can be read and validated by all personnel, from tester to operations manager. This ensures that the correct code is being deployed and that server-specific configurations can be maintained.
Independent install. Scripts do not require an external source for seed data in the way that a replicated environment does. If bcp (bulk copy utility program) files have been provided in the installation set, the installation can occur on any server, anywhere in the enterprise.
Highly automated. Although script-based installations are not as flawless as a full database restore, scripts execute very quickly in a highly automatic fashion. It is very easy to determine if a script ran successfully.
Easy to catch errors. By forcing a release to go through a number of installation steps, it is easy to stop the release at the first sign of trouble. When each script runs, error messages are generated when the code does not conform to the environment as expected. Release personnel can determine whether the code base is suited to the release environment early in the process before all the effects of the release are apparent.
Easy to roll back. Not only can the release be stopped early when using a script-based installation, it is possible to roll back only the steps executed up to the stopping point. This makes the rollback of a problematic release much easier than the rollback of a release that is fully installed in a single step.
Version control is the practice of administering changes with the help of tested methods and techniques in order to avoid new errors and minimize the impact, if any, on the service level.
Source Storage Methods
The easiest way to store code is to put it in a protected folder and use a simple directory structure. Another popular alternative is to use version control software. This has some immediate advantages over directory storage, but it requires a little more administration. Version control software allows you to easily grant read access to developers in a way that is already familiar to them, so they can feel reassured that they have the latest version of the code.
Version control software, however, is not yet integrated with SQL Server, and it can be time-consuming to maintain for a database, especially with regard to change scripts. You will need to be sure that everyone in your group understands how it works and is committed to using it consistently. The main weakness of any storage system for database scripts is that you have to manually update it.
Version control programs have features that allow you to label a set of scripts as part of one release, making it easy to get the most recent copy of all changes. You will need to define and uphold agreements that state how to provide comments for objects to be identified in a certain grouping for a release. Change scripts, which are used to implement changes, are fundamentally different from version scripts, which preserve a record of the create script for the object. Therefore, you must maintain them separately.
To implement this level of source control, choose one target application at time and incorporate your archived scripts from your simple file directory as you go. You can search the directories by clicking the Start button in Microsoft Windows® and then clicking Search. You can search either by file name or by keyword. Version control software also allows you to search code text or comments.
Source Code Tree
One way to organize your code is by application, in a source code tree. For example:
If your company does not maintain a configuration management database, you can use the same basic structure to store information on your servers. For example:
It is also useful to store the related documents for applications, implementations, or server configurations. Store e-mails outlining varying opinions, any design documents whether they are used or not, approvals from Quality Assurance (QA), or any other relevant exchanges. If it later becomes apparent that the system design could have been better, you will have adequate documentation of any alternate designs that were suggested and reasons (if any) that they were discarded. This can save valuable analysis time, which allows you to provide a timely answer even other tasks.
The following is an example of what you could use if you do not already have a change process in place (you could also use an electronic form). After it is signed by the DBA and quality assurance groups, the form becomes a record of the change's history.\
Even in the planning stages, you should be thinking ahead to what could go wrong during your implementation. This risk analysis is vital to the success of your process, and may change what is implemented, or how it is done.
At a minimum, you should make a list of what you think is likely to be a problem, and what would be good alternative actions in the event that things go awry. You must also make sure that when you have implementation problems of a nonfatal nature, you alter the implementation scripts as you use them, adding documentation or code that will help you during a later investigation into the problem. This provides two advantages: first, it allows you to be more prepared to discuss what went wrong; and second, it allows your group as a whole to learn from the implementation experience.
After making a list of potential problems, or after reviewing problems you have encountered in the past, make a chart showing these risks and what you could do to correct each problem. Also list any risks that will prevent the implementation from being completed. No matter what kind of risks you identify, even if you cannot identify any, you must have a rollback strategy for your production implementation.
You should never plan a system change without also planning how to undo it. Unless you take some or all of your system offline during the implementation, your rollback plan must also take into account any ancillary servers you have, such as log-shipped servers, subscribing servers, data ship systems (that is, systems that are not running SQL Server that rely on data feeds from your system),and so forth.
Your rollback strategy is simply a script for how to undo every change you have made, and put everything back the way it was before you started. Script in this situation means a series of steps, written down, which you will perform to roll back your change. This will mostly consist of T-SQL scripts you will run. Even if you plan everything correctly, you may still need to roll back all the changes you have made. You may have identified risks that could cause a catastrophic problem, or something may occur that you did not foresee. Developers or testers may find some problem during the post-implementation test that prevents them from approving the installation. When the decision is made to roll back the installation, you will have a very short amount of time to undo all the changes.
If your database is small enough, you might elect to use a restore of the database as your rollback strategy. If this is an option, you might consider using a marked transaction to coordinate rolling back to the first change in your implementation. In a larger database, however, or in an implementation that includes changes to elements outside the table or code objects, you will need a more detailed plan and possibly even the prewritten code to undo the change. These scripts can be tested during the first unit test or QA implementation.
If the implementation involves additional servers or instances that will not all be upgraded simultaneously, you will need to plan and script how these additional instances will be changed. This type of change can involve more than just change management. For example, suppose that you make a major change to your database (for instance, you merge several databases into one), and you want to delay implementation on your off-site standby, which is normally 24 hours behind the primary server. After you have done your production implementation, the offsite standby is merely serving as an easy way to restore the system to the point in time prior to the change. If a failure occurs on the main site, however, and you need to continue with the new system at the alternate site, you will have a large task to plan and immediately implement. The best practice is to think through all the possibilities and be as prepared as is feasible.
Release and Deployment
Who should deploy the change to the production database? This can be a topic of debate in a group of DBAs. If you have both a development and a production DBA, there are tradeoffs: the development DBA is closer to the change and understands all the related parts, but ultimately the production DBA is responsible for the system. Ideally, all DBAs should be able to act at any point in the life cycle of a project as full–life cycle DBAs. In some situations, the deployment team may handle deploying the change, in which case you must be more rigorous in testing your implementation scripts.
In the end, it's a question of ownership. The person deploying the change should have a good understanding of both the system in question and the change being made. They must have the authority to take drastic corrective action if necessary without undue delay, and must know how to reach all the appropriate people in operations who can assist with operational aspects that are outside the realm of the DBA.
Determining an Implementation Window
If this is a 24x7 system, your project team will have to negotiate an implementation window that is acceptable to all groups affected by the change. In a mission-critical system, even the shortest downtime may be unacceptable, so you may need to provide a read-only database for alternate access. In this case, the team should try to identify a small number of key users and redirect their database connections to a different server through Open Data Base Connectivity (ODBC).
To do this effectively, you will need to have a list of the IT contacts involved, and the user contacts both at the management level and staff level. It is essential for the IT department to maintain credibility with the users by clearly communicating to them the nature of the service interruption and the expected duration. Any change in the schedule or deployment must also be communicated. Also, if you are supplying alternate access, interim support of the read-only system must also be provided, and the users and help desk personnel should be kept informed.
Planning the Implementation
You should compile an implementation plan and distribute it to everyone in Information Systems who will be involved in or affected by the implementation. The plan is simply a list of the steps involved in the implementation, the person responsible for each step, the time that each step is expected to occur, and the person to contact to initiate the next step. Be sure to include cell phone and pager numbers for everyone on site or on call during the implementation.
Try to imagine things that are likely to happen, and make a secondary plan that accommodates these variables. Experience with your application and infrastructure will help you to gauge the level of detail required. Note any situations that would cancel the entire implementation and invoke the rollback plan.
As a group, the implementation team should create a backup plan. The backup plan is a little different from a simple rollback strategy for the database. If anyone's section fails, the group should have an overall plan for evaluating the situation, making a decision, and then proceeding with or canceling the implementation.
Release Readiness Review
The beginning of the operations group's ownership of the system begins with a release readiness review, also called a "go/no go" meeting, which determines whether this system or change to a system is ready for implementation. Although operations staff should be involved throughout the project, this will be the first meeting that is run by them, rather than by the development or quality assurance teams.
The purpose of this meeting is to allow members from each team to indicate their final approval (or disapproval) and any issues they want to raise. This is the last chance to alter the implementation schedule except in the case of a crisis. All groups should be prepared for the deployment. It is important that operations staff members indicate that they are sufficiently knowledgeable and prepared to take on operational support of the product or system.
Any votes of a no-go should be seriously considered. If no resolution can be reached by those present, the meeting should be adjourned with a no-go status until resolution is reached or the objection is overridden by senior IT staff. In any case, the objection, the reasons for it, and any risks that are brought up should be recorded for future reference.
The agenda for this meeting should include:
The readiness of the release itself.
Alterations to the physical environment.
The preparedness of the operations staff and processes.
The installation plan.
The contingency plan.
Potential impacts on other systems.
If you have a formal change control process that requires registration for implementation, be sure that you have followed those procedures. It may be your responsibility to file a change request for the database server. If there is no formal change process in your operations group, you will need to send an e-mail announcement to those in operations who may be affected by the change or by any difficulties encountered during or after the implementation. This would include, at minimum, the support staff, the help desk technicians, and network and security administrators. There should not be anyone who can reasonably claim that they should have been informed and were not.
Coordinating Application Changes
The first step of your implementation should always be a backup of the current system. After this is complete and verified, you are ready to begin. However, the DBA group is rarely the lead on the implementation. Usually the lead is someone from the project team who is responsible for deployment. You should call your contact person and inform him or her that backups are complete and the database team is ready, and then await the signal to begin changing the system.
Deployment and Implementation
Except in the very smallest of organizations, there should always be a second DBA on standby for an implementation. In a large implementation, several DBAs may be involved in the deployment of the system, but you should still have a standby in the event that someone becomes ill, or in case of related or unrelated system failures. Remember, even if another production server fails during an implementation, the deployment must continue as planned. Every reasonable effort should be made to uphold the DBA end of the contract.
In addition to the DBA providing support to an implementation, the DBA should also be sure that people are in place to support him/her. Be sure you have ready the contact numbers of the on-call server room technicians, network administrators, or security administrators whose help you may need during the implementation. Phone numbers of building security should also be available.
Documenting Unplanned Steps and Procedures
Sometimes, despite all your plans, something unforeseen occurs during a production implementation. Regardless of the severity of the problem, you should make a note of it in your implementation script. Record everything, including what code you had to run and what action you had to perform. Do not wait until the crisis has passed to make notes. The sequence of events and steps taken may later be significant, either for support calls you make, for later analysis, or for similar implementations in the future. Remember that meticulous accuracy is more important than anything else at this stage. Due diligence requires that even if you make a mistake, you record that too. Your thoroughness now might save someone else making the same mistake later, and will help the group learn about teamwork as well as technical issues Good teamwork is a vital part of success to an implementation team, which crosses many departments in IT, and it relies on the accuracy of the information provided by team members.
Rollback Plan Documentation
In the event that the deployment fails, you will have to put your rollback plan into action. This rarely happens, but you will appreciate the value of a good rollback plan the first time you have to use it. While you are running your rollback scripts or your database restore, make notes in an e-mail explaining what caused the implementation to be cancelled. It might have had nothing to do with the database at all, but this should still be noted and the appropriate people informed. Remember to inform all the people in Information Systems who were originally notified that the deployment would occur. Unless you have a direct link to the user community, leave that contact to the project managers or to the change control group.
When Your Deployment Is Complete
After you have completed your portion of the implementation, immediately notify your contact person so that the next part of the implementation can proceed. Your task now is to stand by in case something is needed, most notably a rollback. If the implementation is completed successfully, the testers will test the application in the production environment to be sure that everything is working, and the users will be informed that they can resume working in the system. You should make sure that the contact person has an action item to call you to inform you that you are no longer needed. It is a good idea to know what time he or she plans to be finished, and to make known what time you will no longer be available.
After implementation, a meeting should be held to evaluate the success of the project. All aspects of the project should be examined, including the project plan, the design, the arguments, the solutions, the crises, and the final outcome. The point of this process is to learn from the events that occurred. If you have things to add, take your documentation with you to share. Remember to offer feedback about what went well, not just about what went wrong. After this review is complete, the development cycle for that application begins again in the visioning of the next release.
This section covers configuration management in your SQL Server environment, from creating a baseline through what should be included in a run book.
It is worthwhile to develop a standardized configuration for all servers. Although you probably will have servers that deviate from your standard, the standard is still useful because it sets expectations of where to find everything and what settings you can assume are in place. Anything that varies from the standard can then be documented with less effort. This is a simpler approach than trying to remember numerous server configurations individually.
A configuration baseline is a record of the last known good configuration of a product or system. The baseline should capture enough details to enable that product or system to be rebuilt at a later date. Configuration management provides the discipline for clear identification and change of all items that make up the business applications services and IT infrastructure.
The process of establishing a configuration baseline consists of the following activities:
Identification. Identifying infrastructure components and the relationship between components.
Control. Authorization for component changes.
Status accounting. Recording and reporting of CI data.
Verification. Auditing of configuration item (CI) records and infrastructure position.
Inventory. Identification of a configuration item (CI) instance.
Support. Identification of configuration item (CI) attributes that can assist with problem resolution.
Contact maintenance. Authorization and users of the configuration items and/or CI instances.
Configuration Management Benefits
There are several benefits to keeping records about your configuration. The first is to help you rebuild it in the event of a disaster. For this reason, complete run books are important to disaster recovery planning. A run book can contain a subset of the configuration information, or it can contain a very high level of detail, depending on your needs.
On a daily basis, the run book can be referred to as an easy way to get information about a system. If you keep contact and related system information in it, the run book can also help you to identify systems that send or receive data from your system, and people who need to be contacted in emergencies (or who might be willing to answer questions if, for example, you are researching performance problems).
A run book does not have to be an actual printed book, although it is recommended that you print several copies regularly. One copy should be sent out with the backup tapes, so that the system can be restored in the event of a disaster at the main site. In addition to printing the run book, you could also put the information on a Web site. For example, you could have a Web page that shows you a grid of all the latest service packs, hot fixes, CPUs, and so forth. Or, you could put that information into a table or XML and base some self-healing maintenance on that as well. You should also document the organization and procedures for the configuration management of a specific product, project, system, support group, or service.
Creating a Run Book
Crucial to the successful management of change, availability, and disaster recovery planning is the creation and maintenance of a document or collection of documents that succinctly detail the people, processes, procedures, and configuration details related to the system. Such a document is sometimes referred to as a run book, and should be readily kept scrupulously up to date and available to the whole team.
SQL Server Administrative Information
The following list can be used as the basis for a run book or customized operations guide for your database system. The list outlines the configuration items you would use to maintain a high standard of operations support. It is based on SQL Server 2000, but applies equally to other versions in concept. The list has been reviewed and approved by operations and field personnel within Microsoft, and represents the current opinions and best practices of those groups.
To fulfill your responsibilities as the DBA, you must keep a minimum set of information about the system:
Maintain information about maintenance plans: all related scripts, how information is transferred for analysis (if it is), how alerts and errors are handled, and information about remote servers.
Maintain information about database backup files, including their location and type (full database, differential, file, filegroup, or log) and how current the files are. Also record the times that files are backed up to tape (or other removable storage) and where these files are stored, and make notes on related directories, data, or files that must also be backed up. Remember to include the password for each backup if necessary. All related information should be recorded.
Run sqldiag.exe and store the result in a text file. (Do this on a regular basis, and save the files.)
Important: Make sure you know what to look for in the output of sqldiag. If you or your staff is not familiar with it, script the relevant portions separately.
Store your Data Transformation Services (DTS) packages as files to enable an easy modular restore or transfer. Make notes regarding logins and passwords for the packages, and any alternate data sources.
Create a script of all system and/or application users and passwords (including sa). Create scripts for application roles/passwords and linked or remote servers.
Record your software serial number, a copy of the CDs (including all service packs and hot fixes), and a reference on where to find these on network file shares.
Keep a record of all related systems, including contacts, configuration information, and documentation of data interfaces.
Record any custom-made DBA objects that you depend on to maintain or administer the server.
Record hardware and software vendor support phone numbers and account numbers (and any login/password info for related Web sites).
Record contact information for your remote site, if you have one.
Record who to notify in the event that you must recreate standard SQL Server users or reset their passwords.
Use a tool such as Microsoft Visual SourceSafe to manage script versions (including schema, install and rollback scripts, maintenance scripts, and perhaps even the run book). Scripting is especially important for encrypted objects.
Write down contact information. Remember to list these by the role of each person, so that if the person changes jobs, you can still find the correct contact. Ideally, you should record a group name (department, e-mail discussion list, and so forth).
Application System Information
The following application system information should be included:
List all the applications that must be in place for the system to run (either on the server itself or another system if that is required). Include custom-built software.
Document the application's security architecture, including type of logins used, any fixed passwords, and roles permissions,. Note the process for changing passwords on any multiuser logins.
Record contacts for the application: include developers and anyone else (analysts, testers, or managers) who should be involved if a system change is made to the application, or to any related system or process.
The following database component information should be included:
Script all database schemas, collations, jobs, and custom error messages. Anything that can be saved in script form should be scripted and stored historically.
Record information related to distributed databases or partitions (if applicable), such as Data Dependent Routing Tables, distributed transaction marks, and so on.
The following information on storage components should be included:
Note the operating system version, with Service Pack Level and hot fixes.
Keep an exact list of hardware in use, and how that hardware is configured.
CPU information, RAM, BIOS
Record dates and/or version numbers of firmware.
Record physical and logical disk configuration, including RAID levels and disk controller information (including write cache settings), disk type and size, and any special options used (specifically,. allocation units, formerly known as block size).
Keep notes on anything unusual regarding this server, hardware, configuration, or location. For example, what if you have disks on different shelves plugged into the same array controller.
The following information on server components should be included:
Installation configuration of SQL Server, including the installation and Service Pack Levels, and any hot fixes that were applied.
Note SQL Server instance names, IP addresses, ports, configuration options, database file locations, service logins and passwords, e-mail accounts, enabled network protocols, and their order.
Record file share information. Include shares reached by universal naming convention (UNC) names or through any other protocol, to which the service login must have permissions.
Maintain configuration information for any other software that runs on the same server. Make sure complete installation and configuration documentation is available, and/or that correct support personnel (or job titles) are listed as contacts. Also list support numbers and Web sites for each piece of software.
Note any client tools that must be installed for remote database connections (for example, to heterogeneous data sources). Note configuration information.
Document any DSNs that exist on the server.
Document setup of SQL Server 2000 failover clustering, replication, SQL Server 2000 log shipping, and a configuration/topology description.
Document any multi-instance configuration
Keep notes on anything unusual regarding this server. For example, note special features in use such as XML support for Internet Information Services (IIS), Microsoft Active Directory™ support, and so on.
The run book is only useful if it is carefully kept up to date and is stored on a separate system, which must also be treated as highly available. If you can automate the process, do so.
Make sure that you have an up-to-date offsite copy of your run book. Ideally, this copy should be a printed document or an encapsulated program or both, not a backup tape or a database that needs a separately installed GUI. Use a versioning tool to keep track of all changes made to the run book over time, and keep a copy of all software CDs offsite.
The Importance of Standardization
Standardization can be accomplished by establishing technical specifications for products, working methods, and similar components for system uniformity. Use of the standard specifications can be made mandatory for subordinate organizations. Ideally what you want to do is create standards for all aspects of the system and then simply note what qualities of a system diverge from the standard. This is much simpler than documenting each individual system separately.
For example, standardized administrative and support strategies, standardized directories, and standardized hardware and configuration all make it possible to come closer to ideal manageability. The real advantage to having a standards-based system is that decisions can be made more reliably and more accurately based on familiar standardizations.
Configuration Management Database
You can also develop a configuration management database, but this is a large undertaking. In both the case of the configuration management database and the run book, remember that they are tools for managing your work. To decide how much information to put into them, imagine that the data center is flattened and that you must rebuild the servers from whatever you had offsite. Some details would be skipped because the infrastructure of the new data center would be different. However, user and application requirements would be the same.
For example, although there is value in tracking which jack your server is plugged in to, it is not useful to keep this information in a configuration management database. Instead, it belongs in a system owned by Facility or Building Administration (in some cases, these systems contain geo-spatial data to show the complete facility layout). Although it can be beneficial to create some links between these two different data repositories, it is important to keep the different purposes of each clear. Nothing should be introduced that will delay or complicate the process of updating the configuration documentation. If you are concerned about when, why, or by whom changes are made, create an automated audit process for instance by using triggers and log tables, or use versioning software.
In deciding what to store in the run book, remember to include only what you need. For example, although archived database object scripts and sqldiag outputs are of use in supporting the production system and should be readily available to you, you do not need to include them in the run book. Organization and simplification are the goals. You want to be able to quickly find what you are storing.
This chapter has given you an introduction to the processes and procedures for managing control and change in your database environment. It has shown you how to manage change to software and hardware, system processes, and code. You have also seen the elements that should be considered when managing configuration, specifically the items you should consider entering in your run book.
For more information about the topics discussed in this chapter, see the following resources:
"Managing Database Change" in the SQL Server 2000 Resource Kit
More information about MOF can be found on the MOF Web site at: http://www.microsoft.com/trainingandservices/default.asp?PageID=enterprise&AnnMenu=operations.