Export (0) Print
Expand All

Chapter 14 - Data Center Availability: Facilities, Staffing, and Operations

The Internet is having a major impact on information systems departments. With online shopping, business-to-business commerce, customer service, and employee communications, companies of all sizes are experiencing significant growth in their information systems. Many small- to medium-sized companies are now faced with managing more data and more systems to provide information when and where it is needed.

Companies with worldwide operations also face the demands of providing an around-the-clock enterprise network, with around-the-clock access to information. If employees around the world need access to a customer database, the database must be available at times that used to be "off-hours."

Enterprise networks and the Internet place high demands on information systems, while providing little opportunity for maintenance. As system size and system availability requirements increase, facilities and staffing become more important for optimal system performance.

The purpose of this chapter is to provide facility and staffing recommendations for high availability, and to suggest operational guidelines specific to data centers and Microsoft® SQL Server™. The chapter ends with a brief discussion of Application Service Providers, an alternative to running your own data center.

Data Centers

You have data—lots of it. How do you store, secure, move, and manage that data? You definitely need the hardware, software, and network infrastructure to physically host and move the data. You also need staff to manage the systems and data. And, you need facilities to keep the systems (and staff) secure and operational. Data, systems, staff, and facilities comprise a data center.

There is no one type of data center. Depending on business size, budget, and data availability requirements, data center operations can vary dramatically. The volume of servers and other hardware, the required support services, and the number of employees determines the type of data center required:

  • Small data centers may be located in a small room in a building. The room often has only enough room for equipment, may or may not be secure, and provides limited protection for systems in an emergency. These small data centers are sometimes referred to as "data closets." 

  • Mid-sized data centers are often located in the same building as other business operations, but have separate security systems to restrict access to the data. Inside the data center may be mainframes, servers, automated backup equipment, offices, and climate control systems. Support services, such as power and telecommunications, are often redundant so the data center can be operational when a service is lost. 

  • Large data centers may occupy entire buildings. Data security and redundant systems are provided throughout such buildings. Emergency backup systems allow the data center to be operational in case of a major disaster, such as a large earthquake or long-term power outage. This is especially important to businesses such as banks and insurance companies that need systems to be operational during such emergencies. 

The term "data center" can mean these medium to large facilities.

Facility and Equipment Requirements

A data center must provide reliable access to data. Several factors determine the availability of data: Facilities must support availability requirements; equipment must be high quality and designed to meet availability and performance requirements; and communication between components must support those requirements.

The Data Center Facility

A data center facility is designed to ensure system availability and protect against data loss. During a fire, power loss, or equipment failure, the redundant systems of a well-planned data center can mean the difference between data availability and data disaster. Here are some facility recommendations.

  • Raised Floors 

    All medium to large data centers should consider raised floors. A raised floor provides space for the massive amounts of cable required in a data center, and can simplify the process of adding and moving equipment. Another benefit of a raised floor is cooling: cool air can be pushed under raised floors and directed at servers and other heat-sensitive equipment. 

  • Fire Suppression 

    Good smoke detectors and fire extinguishers are crucial. Gas (such as FM200) and water systems are commonly used for fire suppression. Smoke detectors and temperature sensors should be located throughout the data center so conditions can be monitored and controlled in zones. Make sure the fire suppression system can be started and stopped manually. 

  • Temperature and Humidity Control 

    Computer equipment reliability is better in cool conditions — 68 degrees Fahrenheit (20 C) is reasonable for both equipment and people. While desktop computers and individual servers use fans to keep the CPU cool, fans do not provide enough cooling for a data center.

    Small data centers often rely on the air conditioning in their building. Larger data centers have more equipment, producing more heat, and need a dedicated, redundant cooling system. Make sure emergency generators can power the cooling systems. 

    Humidity is also a concern for data centers. High humidity can lead to condensation on equipment. Very low humidity can lead to excess static electricity. Large fluctuations in humidity can cause circuit boards to expand and contract, damaging circuitry. A good target is 40–45 percent relative humidity. 

  • Redundant Power 

    Power outages can be either widespread or very local. Preparing for both possibilities will reduce the odds that power loss will cause a significant service interruption. 

    Bad weather is not the only cause of a widespread power outage. Other common reasons for power outages include blown transformers, construction, and accidents. When an event like this occurs and power to the data center is lost, a battery backup system can supply enough power for an orderly shutdown. If systems must be operational, install redundant backup generators to power critical equipment.

    Within the data center, blown circuits or damaged wiring can cause an outage to a rack or to individual components. Redundant power supplies to each rack of equipment can prevent a blown circuit from causing downtime; if a main circuit loses power, a redundant power supply automatically switches the rack to a second power supply. For incidental electrical problems, make sure to stock spare parts on site.

  • Data Connectivity 

    To provide communications with people outside the data center, the facility needs voice and high-speed data connections. Like power systems, communication systems should be redundant. If one carrier loses service or must take their system down for maintenance, having a second carrier will allow users to keep accessing the information they need.

    To prevent damage to multiple lines during construction or maintenance, the redundant lines should enter the data center at different locations. Also, for optimal service and cost effectiveness, the facility should be near an Internet hub. 

  • Backup Systems and Off-Site Storage 

    Databases and file systems must be backed up frequently according to a defined schedule. Small data centers might manually back up data, while larger data centers typically need automated backup devices, such as tape libraries that mechanically insert and remove tapes.

    Store backups off site. If backups are stored on site, a disaster at the data center might destroy both production and the backups. 

  • Security 

    Stolen data is a big concern for most companies. Intruders can access data in person by physically entering the data center, or can obtain virtual access through a network connection. 

    To stop intruders from entering the data center, require staff and visitors to provide credentials on entry. Security cameras are also helpful.To stop virtual intruders, enforce security on each server, secure the network using a firewall, and require employees to use strong passwords that change frequently. The best defense against network attacks is a well-trained staff that stays current with "hacking" trends. 

  • Space 

    A data center should provide enough room for equipment to be organized, room for growth, and room for staff. Relocating equipment often causes downtime. Adequate room for employees is important for employee productivity. 

Data Center Hardware

Once the data center facility is ready, you will need to bring in the servers and network infrastructure to support your systems. SQL Server Books Online provides absolute minimum hardware configurations for SQL Server 2000. Necessary hardware configurations depend on the performance and availability requirements.

Hardware requirements are often system specific. Individual hardware and application vendors often provide system requirements or white papers that include hardware, configuration, and tuning guidelines. Follow vendor guidelines for hardware selection, as well as other system requirements.

How a database is used also affects hardware selection. Large databases with higher transaction throughput require lots of memory and a robust disk subsystem. In any database that sustains heavy read or write activity, separate the system files, data files, and log files on multiple disks and controllers to improve performance by load-balancing disk reads and writes. For very large or busy databases, further performance improvements are possible by spreading the data files across multiple drives.

  • Finally, availability requirements also affect hardware selection. For some applications, downtime is very expensive, so a backup system must be available at all times. Availability solutions such as SQL Server 2000 failover clustering, log shipping, and replication require multiple servers.

For more information on SQL Server hardware configurations, see Chapter 33, "The Data Tier: An Approach to Database Optimization."

Data Communication Within the Data Center

Within a data center, communication rates between components impact system performance. For example, if you have a group of servers that store data on an array of disks, the communication rate between the servers and the storage array affects system response time. If communication between system components occurs over a shared network, any network congestion will slow system performance.

High-speed interconnects, often referred to as SANs, can improve performance within a data center. Note that the acronym SAN has multiple meanings:

  • A storage area network is an interconnect between multiple servers and a shared resource, such as a storage subsystem. Fibre Channel is typical for this type of interconnect. 

  • A system area network is an interconnect between servers, and is often used for communication within a cluster or federation of servers. An example of this type of interconnect is Giganet's cLAN. 

A new standard for interconnects is called the Virtual Interface Architecture (VIA). The standard, created by Compaq, Intel, Microsoft, and other companies, defines a protocol for a high-speed messaging interface that reduces communication latency between connected devices. This latency reduction is accomplished by using hardware to create virtual connections between devices, bypassing slower network protocols such as TCP/IP.

SQL Server 2000 Enterprise Edition supports the Giganet VIA SAN implementation. Because system area networks are intended to support fast communications between servers, SQL Server 2000 supports the VIA Net-Libraries only on Microsoft Windows NT® Server and the Windows® 2000 Server, Advanced Server, and Datacenter Server operating systems. You can download updated tools and libraries at http://www.microsoft.com/sql/.

Staffing Recommendations

Running a data center requires a highly trained staff that can plan and implement complex systems, and can respond quickly when problems arise. Microsoft Consulting Services (MCS) works with organizations to develop systems and recommend system support procedures, and has defined several roles for implementing and supporting SQL Server in a data center.

Note that these roles do not necessarily map to individual employees. In smaller organizations, a team of two or three people might cover all the roles. In a larger organization, the number of servers, databases, and tasks may require multiple people to fill one role. Also, in your organization you may use different names. For example, your organization may have a Database Administrator (DBA) that fills the roles of the Deployment Specialist and SQL Developer.

Deployment Specialist

The Deployment Specialist is responsible for the installation, configuration, and operation of SQL Server. To meet the requirements of this role, the Deployment Specialist performs the following tasks:

  • Works with the staff to determine the requirements for SQL Server installations. 

  • Installs and configures SQL Server on test machines. 

  • Creates setup scripts for quick installation on other servers and for server recovery. 

  • Documents all SQL Server installations before moving them to production, and documents all changes made to SQL Server in production. 

  • Deploys SQL Server to production. 

  • Maintains SQL Server installations. 

In high-availability configurations, the Deployment Specialist also configures and manages failover clustering, linked servers, and replication.

Security Specialist

The Security Specialist develops and implements security plans for each server. A security plan ensures that the server and databases are secure, and that users can access the data they need. The Security Specialist typically performs the following tasks:

  • Determines which SQL Server security mode to implement on each instance of SQL Server. 

  • Administers server login accounts for individuals and groups. 

  • Administers database user accounts and roles. 

  • Creates scripts for restoring security settings. 

  • Manages permissions of applications within SQL Server. 

If the environment supports the passing of security credentials between servers (delegation), the Security Specialist configures delegation.

Disaster Recovery Specialist

The Disaster Recovery Specialist develops and implements database backup and recovery strategies. This role is crucial for availability of each database. The Disaster Recovery Specialist typically performs the following tasks:

  • Works with Application Specialist and SQL Developer to create a reliable backup and recovery plan according to application usage. 

  • Documents the backup and recovery plan for each database. 

  • Creates automated jobs and alerts to implement backups. 

  • Creates jobs for on-demand backup and recovery. 

  • Tests backups to ensure they will recover the databases. 

  • Works with Deployment Specialist to develop server recovery procedures. 

The Disaster Recovery Specialist owns the backup and recovery plans, and must update the plans as necessary. This requires knowledge of the entire system, and input from other staff members.

SQL Developer

The SQL Developer designs and creates databases. In some organizations, this role is performed by a DBA. The SQL Developer typically performs the following tasks:

  • Designs databases according to customer and application specifications. 

  • Implements database schemas, files, and indexes. 

  • Tunes databases for performance. 

  • Trains support specialists on the design and function of each database. 

  • Documents the schema and functions of each database for support specialists and future developers. 

  • Works with the Disaster Recovery Specialist to create a recovery plan for each database. 

  • Provides SQL scripts for rebuilding each database. 

Once a system is in production, the role of the SQL Developer in that database is reduced. The developer may be involved with implementing significant changes and performing some maintenance.

Application Specialist

The role of the Application Specialist is to know third party and custom applications, and to work with other staff members to integrate and troubleshoot the applications. The Application Specialist typically performs the following tasks:

  • Learns how to use the database applications, and how the applications use the databases. 

  • Works with the System Continuity Specialist to identify key application monitors. 

  • Works with the SQL Developer and Deployment Specialists to ensure applications are making optimal use of SQL Server. 

  • Provides documentation for application support and recovery. 

For Web applications, knowledge of Web technologies such as XML and scripting is important as well.

System Continuity Specialist

The System Continuity Specialist develops and implements the monitoring strategy for servers, databases, and applications. The System Continuity Specialist typically performs the following tasks:

  • Determines which tools to use to monitor hardware, software, and applications. 

  • Picks the counters and conditions to monitor. 

  • Creates monitoring templates for tools like SQL Profiler and System Monitor. 

  • Creates the baseline reports used to compare system performance over time and after significant system changes. 

  • Defines the monitoring schedule and creates automated jobs for common monitoring tasks. 

  • Documents the monitoring strategy. 

  • Works with other staff members to troubleshoot poor performance. 

Monitoring Specialist

The Monitoring Specialist performs day-to-day monitoring tasks, working closely with the System Continuity Specialist and Trend Analysis Specialist. The Monitoring Specialist typically performs the following tasks:

  • Works with the System Continuity Specialist to create the monitoring strategy. 

  • Ensures monitoring jobs are running. 

  • Responds to system alerts and provides problem details to other staff. 

  • Consistently documents system performance and creates reports for the Trend Analysis Specialist. 

Note Monitors should always be running, so starting and stopping monitoring jobs should not be a common task.

Trend Analysis Specialist

The Trend Analysis Specialist looks at system performance over time and after significant system changes, and determines when to take measures to improve performance. The Trend Analysis Specialist typically performs the following tasks:

  • Analyzes monitoring reports and looks for performance trends. 

  • Works with the staff to determine the causes of performance degradation. 

  • Works with the staff to find permanent solutions to performance degradation. 

Quality Assurance Specialist

The Quality Assurance Specialist ensures all servers, databases, and applications are fully tested and comply with all operational guidelines before being released to production. The Quality Assurance Specialist typically performs the following tasks:

  • Defines quality assurance guidelines, which define minimum standards for production systems. 

  • Ensures all production systems pass quality assurance guidelines. 

  • Creates the change control process, which is a procedure used for implementing changes on production systems. 

  • Owns or manages the use of a test lab where all new or modified systems are tested. 

Hardware Specialist

The Hardware Specialist is responsible for the selection, configuration, and operation of data center hardware. The Hardware Specialist typically performs the following tasks:

  • Recommends server, storage, and network components. 

  • Installs and configures hardware. 

  • Works with the Trend Analysis Specialist and other staff to determine when hardware upgrades and additions are necessary. 

The hardware specialist is also responsible for hardware during disaster recovery, and needs to make sure adequate supplies of hardware are available for emergencies.

Support Specialist

The Support Specialist provides customer support, and is the primary customer contact. The Support Specialist typically performs the following tasks:

  • Learns customer applications and how the applications integrate into data center systems. 

  • Takes support calls from customers and helps them troubleshoot basic problems. 

  • Creates a support escalation procedure. 

  • Works with the rest of the team to define a troubleshooting procedure. 

  • Documents customer concerns and system bugs. 

Operational Guidelines

Facilities and staffing are critical to data center operations. In addition, establishing a set of guidelines and tasks for the daily operations of a data center will help the data center run smoothly and be prepared for emergencies.

General Operations

Each system in a data center has its own operational guidelines. But the data center itself must have operational guidelines to ensure quality, to control changes to production, and to prepare for emergencies.

Quality Assurance

Quality Assurance is the process of determining if a system meets predefined measures of quality. In a data center, a quality assurance process ensures that new or changed systems work as specified, integrate well with other systems, and are documented so they can be maintained and recovered. For example, here is a high-level process for new systems:

  • Determine a set of standards that must be met by the application. For example, the application needs to run in a test lab for two weeks without failure. 

    Configure the test lab for integration testing, and test the new system:

    • Determine if application integration standards are met.

    • Simulate a variety of server loads to evaluate system performance.

    • Test how the system responds to errors. 

  • Evaluate performance against an actual baseline. 

  • Determine if additional hardware is necessary. If it is, configure the additional hardware and re-test the system. 

  • Ensure documentation exists, as well as scripts for installation and recovery. 

Once the system passes the quality assurance process and is released on the production servers, create a new performance baseline, and then continue to monitor the system.

Change Control

A data center is not a stable environment. Equipment, operating systems, and applications change frequently to improve or maintain performance. In such a complex environment, a change on one system can cause unexpected problems. To reduce the possibility of problems, changes must be thoroughly planned. Change control is the process of ensuring that changes are planned and tested before being moved to production, and that changes to production are scheduled and documented.

Each data center needs to develop a change control procedure to meet the needs of the staff, management, and systems. However, most change control processes follow a similar process, such as the following:

  1. In a document created from a change control template, a customer or a staff member proposes a change to production. 

  2. A small committee reviews the proposal, and either agrees to implement a test system or denies the proposal. 

  3. If the staff agrees to pursue the change, the staff implements the changes in the test lab. Much of the staff may be involved in this process, especially if hardware and security changes are necessary. 

  4. The Quality Assurance Specialist leads the testing effort, making sure the new or changed system is performing as expected, and that other systems are not negatively impacted. 

  5. If the system performs well in the lab, the Deployment Specialist makes the final decision on implementing the changes in production. 

  6. The Quality Assurance Specialist makes sure all quality assurance guidelines are met before approving the system for production, the System Continuity Specialist develops a monitoring plan, and the Disaster Recovery Specialist develops a backup and recovery plan. 

  7. The Deployment Specialist, along with any other necessary staff, implements the changes in production during a change control window, and according to change control guidelines. 

  8. The Monitoring Specialist monitors the new system, watching for unexpected impacts. 

The change control process can differ significantly based on the complexity of the change. A simple change, such as adding an alert to a database, may have an expedited change control process. A complex change, such as installing an application, requires more decisions and a more substantial process.

Here are some change control suggestions:

  • Justify all changes before modifying a production system.

  • Ensure that all changes run successfully in a test lab before implementing in production. And make sure to document all changes. 

  • Use Sunday evening, or another low-impact time, to update production. A set change-control time window will reduce the impact on production, and will help enforce change control guidelines. 

Emergency Preparedness

During an emergency or system failure, you need to have a recovery plan with which everyone is familiar and comfortable. Here are some recommendations for emergency preparedness:

  • Provide up-to-date contact information for everyone in the data center, and everyone you may need to contact in an emergency. Include home and work phone numbers, pager numbers, and backup contacts. 

  • Have recovery instructions for all equipment. Do not store the recovery instructions online in a system that may itself need to be recovered. 

  • Stock spare parts, especially disk drives. 

  • Practice. Run fire drills. Recover test systems from production data. 

SQL Server Operations

The following guidelines provide a starting point for SQL Server operations. Organizations need to customize their SQL Server operations to their systems and availability requirements.

Security

Before any system is placed in production, the Security Specialist must work with the Deployment Specialist, Application Specialist, and SQL Developer to make sure the system is secure. The following is a short list of security recommendations:

  • Do not use the sa password, which is provided in SQL Server 2000 for backward compatibility only. Instead, use the default mode for SQL Server 2000, which is to disable the sa password. If you enable the sa password during installation, set a password immediately.

  • For user and group logins, Windows Authentication is more secure and easier to administer than SQL Server Authentication. In this mode, administrators grant SQL Server access to Windows users and groups, and SQL Server relies solely on the Windows authentication of the user. User passwords are not stored in SQL Server; however, applications and Web sites often need to use SQL Server Authentication to establish a connection with a database. If you need to use SQL Server Authentication for some purposes, and can use Windows Authentication for some users, use Mixed Mode security. 

  • If a server uses SQL Server Authentication, create scripts for the recovery of security information. When a database is moved from one server to another, database users are transferred, but server logins are not, which can cause a mismatch between logins and users. The script needs to establish all necessary logins on the new server, and then run sp_change_users_login to match logins with users. 

  • Restrict or disable the guest user accounts. If a person has a SQL Server login account, but does not have explicit access to a database, that person has guest permissions on every database with a guest user account. Also, application roles can gain access to other databases through the guest user account. 

  • Create set procedures for adding and removing accounts. 

For more information see Chapter 10, "Implementing Security."

Monitoring

Monitoring is the only good way to know how systems are performing. Without monitoring, you may not be aware when downtime occurs.

For SQL Server, the primary areas to monitor are disk usage, memory usage, CPU performance, and SQL Server performance. Two tools commonly used to monitor performance are System Monitor and SQL Profiler. For a complete list of SQL Server tools, see "Choosing a Monitoring Tool" in SQL Server Books Online.

System Monitor

System Monitor is a Windows 2000 tool used to collect and view real-time data about system performance. Use System Monitor to establish a performance baseline, and to constantly monitor system performance. To analyze performance trends, compare current performance against the baseline.

The following lists of counters and events provide a starting point for monitoring SQL Server. System Continuity Specialists should create a custom monitoring system for individual servers, databases, and applications. For a complete list of applicable counters, see "Monitoring with System Monitor" in SQL Server Books Online.

Disk usage counters:

  • Logical Disk: % Free Space — Make sure you have adequate free space for data and log file growth. Set an alert when the percentage of free space is low, typically between 10 and 30 percent, depending on database size and growth rate. 

  • Physical Disk: Avg Disk Queue Length — The average disk queue length indicates the number of disk requests that are on hold because the disk cannot keep up with requests. If this number is greater than two on any disk, consider adjusting the disk configuration or rearranging data. This counter is also available for logical disks. 

  • Physical Disk: % Disk Time — This counter indicates how often the disk is busy servicing reads or writes. If this number is greater than 90 percent on any disk, consider adjusting the disk configuration or rearranging data. You can also compare the Disk Write Bytes/sec, Disk Writes/sec, Disk Read Bytes/sec, and Disk Reads/sec counters to find out what type of activity is causing high usage rates. These counters are also available for logical disks. 

Memory counters:

  • Process: Page Faults/sec: sqlservr — This counter indicates the number of times the sqlservr process could not find data in memory and had to go to disk to find the data. This number should be at or near 0 during normal operations. Two other counters, Memory: Pages/sec and Memory:Page Faults/sec, can help you analyze total system memory. If these counters are high, such as over 20, another process may be using too much memory, or the server needs additional memory. 

  • SQL Server: Buffer Manager: Buffer Cache Hit Ratio — This counter indicates the percentage of requested data that exists in memory. The counter should be 90 percent or greater. If the hit ratio is too low, add more memory to the server. 

Processor counters:

  • System: Processor Queue Length — This counter indicates how many threads are waiting for the processor. If this number is greater than two per CPU, tune your databases or provide more processing power. Slow performance, a queue of 0, and low CPU usage may indicate a performance bottleneck elsewhere, such as the disk subsystem. 

  • Processor: % Processor Time — If the server is running SQL Server only, and the processor active time is greater than 90 percent, you probably need to adjust how applications are using the database, tune the database, or increase processing power. 

  • Process: % Processor Time: sqlservr — This counter indicates the percentage of time that all threads of the sqlservr process are using to execute instructions. Compare this to % Processor Time. If other processes are consuming too much time, terminate nonessential processes or move them to other servers.

SQL Server and database counters:

  • SQL Server: General Statistics: User Connections — Once you set user connections to the maximum expected number of concurrent users, use this counter to track the number of user connections. Set an alert to notify when the number of current user connections reaches the maximum. 

  • SQL Server Locks: Number of Deadlocks/sec — This counter should be 0. When two threads, each having a lock on one piece of data, attempt to acquire a lock on the other's piece of data, the threads are deadlocked. SQL Server does break deadlocks, but it is best to find the source of deadlocks and fix the problem, or deal with deadlocks in the application. 

  • SQL Server: Databases: Transactions/sec — This counter shows how many transactions per second the specified database is supporting. This information is important for trend analysis. 

  • SQL Server: Databases: Percent Log Used — This counter indicates how full the transaction log is for the specified database. Create a log backup job that is triggered whenever the log is almost full, such as when it reaches 80 percent capacity. Send a notification when the job runs successfully, and send an alert when the job fails. 

  • SQL Server: Databases: Data File(s) Size (KB) This counter indicates the combined size of all the data files that make up the specified database, including any automatic growth. To minimize autogrow operations, which generate a lot of I/O, make sure this counter is large enough to hold the data that will be stored in the database. This is especially true with the tempdb database. The tempdb database is recreated each time SQL Server starts. If the size specified for the tempdb data files is too small, then overall performance may be impacted by numerous tempdb autogrow operations. Compare the size reported for tempdb by this counter to the space allocation reported when you use SQL Server Enterprise Manager to display the properties of tempdb. If the counter is reporting a much higher size, increase the space allocation defined for tempdb in SQL Server Enterprise Manager. 

  • SQL Server: Databases: Backup/Restore Throughput/sec — Use this counter to monitor data throughput during backup and restore operations. You can test multiple configurations to find out which configuration is more efficient. 

Many more counters are available for SQL Server. For example, the SQL Server: Access Methods object provides several counters to monitor how the logical pages within a database are accessed, including index usage. For more information about choosing counters to monitor for performance, see Inside Microsoft SQL Server 2000 by Kalen Delaney.

SQL Profiler

If System Monitor counters do not indicate performance issues, but a server has periodic poor performance, use SQL Profiler to analyze SQL Server activity. Monitoring tools such as SQL Profiler use system resources, so limit the number and complexity of traces. Here are some examples of how to use SQL Profiler:

  • List all long-running queries. 

  • Count all table scans, which are an indication that indexes are not being used. 

  • Analyze the queries that are creating deadlocks. 

  • Analyze the activity of a specific database user. 

  • Create a complete sample of database activity for the Index Tuning Wizard.

SQL Profiler provides templates for many common analysis scenarios. You also can create your own templates for custom scenarios. For more information about SQL Profiler, see "Monitoring with SQL Profiler" in SQL Server Books Online.

Backup and Recovery

A backup and recovery plan is critical to the availability of a data center, and requires more than simply backing up data on a regular schedule. For large databases, backing up the entire database can take more time than is available. And, if backup is slow, recovery will likely be slow as well. Consider the following when creating a backup and recovery plan:

  • Create a step-by-step system recovery plan and keep it up-to-date. 

  • Create recovery scripts for configuration settings, databases, logins, and users. 

  • Store backups in a secure, off site location. 

  • Test your recovery plan using real backups. 

  • Create an alert to notify the Disaster Recovery Specialist when the log is 80 percent full (or less, if necessary). 

Maintenance

Although SQL Server is a self-tuning system that requires little maintenance, you do need to maintain databases and indexes and watch system performance. The following tasks will help maintain database performance.

  • Maintain indexes. Creating indexes can greatly improve performance, but index fragmentation and changes in usage patterns can make indexes less useful to queries. Analyze index performance, and defragment or rebuild indexes as necessary. The Index Tuning Wizard is very useful for this task. 

  • Update statistics. SQL Server uses distribution statistics to improve its use of indexes. If you have turned off the automatic updating of statistics for performance reasons, you need to frequently update statistics. This is especially important with significant changes to data or changes to the distribution of data.

  • Maintain load balance. As a database grows, the balance of data and activity across disks can change. If one disk becomes more active than others, redistribute files to rebalance the load. 

  • Stock spare disk drives, cables, installation disks, and other critical components. 

  • Document the amount of downtime you experience in the data center. This will help measure how well the data center is operating. 

For easy maintenance, automate as many maintenance tasks as you can. For example, let SQL Server statistics update automatically, or create a job that updates statistics.

Application Service Providers

Some companies may not want to invest in the equipment and skills necessary to run a data center. For these companies, Application Service Providers (ASPs) provide another option. ASPs offer various levels of application and system hosting, such as:

  • Customers simply leasing rack space for their own servers. 

  • Customers leasing space on a server, and managing their own applications and data. 

  • Customers receiving comprehensive services, including system development, hosting, and maintenance. 

Internet Service Providers (ISPs), a similar business model, focus on providing Internet access, Web site hosting, and Web site development. ISPs typically do not provide the same range of services as ASPs.

Using an ASP

ASPs provide valuable services, but when does using an ASP make the most sense? This decision is often based on several factors, including the company size, the reason for outsourcing, and the company's comfort with the ASP model. Here are some reasons companies use ASPs:

  • Small companies can benefit from economies of scale—the large ASPs may provide service at a lower cost, especially if the company has not already invested in data center staff and equipment. 

  • ASP fees are negotiated according to levels of service, and can be more easily budgeted. 

  • If data center employees are difficult to find and keep, an ASP can offset this cost by providing development and maintenance services.

  • ASPs sometimes specialize in certain applications, which can greatly reduce development and deployment time. 

  • If your business has extremely busy periods, an ASP may be able to host extra servers to temporarily increase capacity. 

Before choosing an ASP, consider the following:

  • For a larger company with an existing IT organization, an ASP may be more expensive. Analyze service level agreements, as well as fees for services above the negotiated level.

  • A company that uses an ASP is not free from managing information. Customers, suppliers, or employees use the data, and the company is still responsible for that. The company also needs to manage the relationship with the ASP, including training the ASP on existing applications, negotiating terms of service, and evolving IT operations as they relate to business plans. 

  • Some companies are not comfortable with an ASP hosting critical or sensitive company data. However, ASPs are not an all-or-nothing choice. You may wish to outsource some applications and not others. 

ASPs and Multiple Instances of SQL Server

SQL Server 2000 supports multiple instances of the relational database engine on the same server. This allows an ASP to more efficiently use resources. If a group of customer databases can run on one server, the ASP can install one default instance and multiple named instances on that one server, then define administrative privileges for each instance.

Summary

Running a data center is a complex task. However, with the proper facilities, a well-trained staff, and good operational procedures, a data center can provide the availability and reliability necessary for Internet and enterprise services.

The qualities of a well-planned data center include:

  • Facilities that provide redundant support systems and enough space to properly host all people and equipment.

  • A well trained staff that fills the roles necessary for planning, running, and supporting data center equipment and operations.

  • Documented operational procedures that the staff uses to ensure maximum performance and system availability. 

Not all companies want to, or can, run their own data center. For these companies, ASPs provide another option.

For more information about data center planning and operations, see http://www.microsoft.com/mof/.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft