Fault Tolerance

Fault tolerance is the ability of a system to continue functioning when part of the system fails. Fault tolerance combats problems such as disk failures, power outages, or corrupted operating systems, which can impact startup files, the operating system itself, or system files. Windows 2000 Server includes fault-tolerant features.

Although the data is always available and current in a fault-tolerant system, you still need to make tape backups to protect the information about your disk subsystem against user errors and natural disasters. Disk fault tolerance is not an alternative to a backup strategy with off-site storage.

Fault-tolerant disk systems are standardized and categorized in six levels, known as RAID level 0 through level 5. Each level offers a specific mix of performance, reliability, and cost.

Disk Management

Windows 2000 Disk Management includes RAID levels 1 and 5:

Level 1: Mirrored volumes (mirror sets in Windows NT 4.0)

Mirrored volumes provide an identical copy for a selected volume. All data that is written to the primary volume is also written to a secondary volume or mirror. If one disk fails, the system uses data from the other disk. Because each file is stored in two locations, you need twice your usual storage space to implement this.

Level 5: RAID-5 volumes (striping with parity)

RAID-5 volumes share data across all the disks in an array. The system generates a small amount of data, called parity information, that is used to reconstruct lost information in case a disk fails. RAID 5 is unique because it writes the parity information across all the disks. If a disk fails, data redundancy is achieved by arranging a data block and its parity information about different disks in the array. This level requires a minimum of three disks. As more disks are added to a RAID-5 set, the amount of overhead decreases from the maximum of 50 percent (that is, three disks are required to store the data normally on two disks). However, the benefits of having many disks in a RAID-5 set drops off when seven or more disks are used in the set.

Selecting a RAID Strategy

RAID strategies include hardware and software solutions. Choosing between RAID-1 and RAID-5 volumes depends on your computing environment. Consider the following when selecting a RAID strategy:

  • When compared to RAID-5 volumes, a mirrored volume implementation has a lower entry cost, requires less system memory, provides better overall performance, and does not show performance degradation during a failure. However, its cost per megabyte is higher than that for RAID-5 volumes.

  • A software RAID-5 volume implementation has better read performance and a lower cost per megabyte, but it requires more system memory and loses its performance advantage when a disk in the array is missing.

  • Hardware or software RAID-5 volumes are a good solution for data redundancy in a computing environment in which most activity consists of reading data. For example, you might want to use a RAID-5 volume on a server that is used to maintain all copies of the programs for your site. It enables you to protect the programs against the loss of a single disk in the striped volume. In addition, the read performance improves due to concurrent reads across the disks that make up the RAID-5 volume.

  • In an environment in which frequent updates to the information occur, it might be better to use mirrored volumes. However, you can use a RAID-5 volume if you want redundancy and if the storage overhead cost of a mirror is prohibitive.