Storage (Server Clusters: Frequently Asked Questions for Windows 2000 and Windows Server 2003)

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

There are many storage questions, these questions are categorized as general questions, questions about deploying Server clusters on a storage area network (SAN) and network attached storage (NAS) questions.

General Storage Questions

Q. What storage interconnects are supported by Server clusters?

A. Cluster Server does not limit the type of storage interconnects supported; however, there are some requirements of the storage subsystem that limit the types from a practical perspective. For example, all of the cluster nodes should be able to access the storage device. This typically impacts the interconnect since only interconnects that allow multiple initiators (i.e. nodes) can be used. The set of interconnects that are currently part of qualified configurations on the HCL include: SCSI (of various flavors), Fibre channel arbitrated loop and fibre channel switched fabrics.

Remember that only clusters where the full configuration appears on the Cluster HCL are supported by Microsoft.

Q. How do I configure the nodes and the storage on a SCSI cluster?

A. You must make sure that all of the devices on the SCSI bus have different SCSI Ids. By default, SCSI adapters tend to have Id 7. You should make sure that the adapters in each node have different Ids. Likewise, the disks should be given unique SCSI Ids on the bus.

For a SCSI bus to work correctly it must be terminated. There are many ways to terminate the bus, both internally (at the host adapter) and externally (using Y cables). To ensure that the cluster can survive different types of failures (specifically being able to power down one of the nodes), the SCSI bus should be terminated using passive components such as a Y cable. Internal termination, which requires the adapter to be powered up, is not recommended.

Note

Microsoft only allows 2-node clusters to be built using SCSI storage interconnects.

Q. Does Server clustering support fibre channel arbitrated loop (FC-AL)?

A. Yes, Microsoft only allows 2-node clusters to be built using FC-AL storage interconnects. Multiple clusters on a single fibre channel loop are NOT supported.

Q. Can multiple clusters be connected to the same storage controller?

A. Yes, there is a special device qualification test for storage controllers that ensures they respond correctly if multiple clusters are attached to the same controller. For multiple clusters to be connected to the same controller the storage controller MUST appear on the Multi-cluster device Hardware Compatibility List (HCL) AND each of the individual end-to-end cluster solutions must appear on the Cluster Hardware Compatibility List. For example: EMC Symmetrix 5.0 is on the multi-cluster device HCL list. Multiple clusters (say a Dell PowerEdge cluster and a Compaq Proliant cluster) can be connected to the EMC Symmetrix storage controller as long as Dell PowerEdge + EMC Symmetrix AND Compaq Proliant + EMC Symmetrix as both on the cluster HCL.

Q. Will failover occur if the storage cable is pulled from the host bus adapter (HBA)?

A. If the storage cable is pulled from the host bus adapter (HBA), there may be a pause before the adapter reacts to losing the connection, however, once the HBA has detected the communication failure, the disk resources within the cluster using the specific HBA will fail. This will trigger a failover to occur and the resource will be brought back on line on another node in the cluster.

If the storage cable is reconnected, the Windows operating system may not rescan for new hardware automatically (this depends on the driver used for the adapter). You may need to manually initiate a rescan for new devices. Once the rescan is done, the node can host any of the physical disk resources. If failback policies are set, any resources that failed over when the cable was removed will failback to the node when the cable is replaced.

Note

An HBA) is the storage interface that is deployed in the server. Typically this is a PCI card that connects the server to the storage fabric.

Q. Will Server clusters protect my disks from hardware failures?

A. No, Cluster server protects against server failure, operating system or application failure and planned downtime due to maintenance. Microsoft strongly suggests that application and user data is protected against disk failure using redundancy techniques such as mirroring, RAID or replication either in hardware or in software.

Q. Do Server clusters support RAID or mirrored disks?

A. Yes, Microsoft strongly suggests that application and user data is protected against disk failure using redundancy techniques such as mirroring, RAID or replication either in hardware or in software.

Q. Are dynamic disks supported in a cluster?

A. Windows server products shipped from Microsoft do not provide support for dynamic disks in a server cluster environment. The Volume Manager for Windows 2000 add-on product from Veritas can be used to add the dynamic disk features to a server cluster. When the Veritas Volume Manager product is installed on a cluster, Veritas should be the first point of support for cluster issues.

Q. Can a cluster disk be extended without rebooting?

A. Yes, cluster disks can be extended without rebooting if the storage controller supports dynamic expansion of the underlying physical disk. Many new storage controllers virtualize the Logical Units (LUNs) presented to the operating system and these controllers allow LUNs to be grown online from the storage controller management console. Microsoft provides a tool called DiskPart that allows volumes or partitions to be grown online to take advantage of the newly created space on the disk without disruption to applications or users using the disk. There are separate versions of DiskPart for Windows 2000 and Windows Server 2003. The Windows 2000 version is available as a free download on the web and the Windows Server 2003 version is shipped on the distribution media.

Note

An LUN equates to a disk device visible in Disk Administrator.

Q. Can additional disks be added to a cluster without rebooting?

A. Yes, you can insert a new disk or create a new LUN and make that visible to the cluster nodes. You should only make the disk visible to one node in the cluster and then create a cluster resource to protect that disk. Once the disk is protected, you can make the LUN visible to other nodes in the cluster. In some cases you may need to do a rescan in device manager to find the new disk. In other cases (especially with fibre channel) the disk may be automatically detected.

Q. Can disks be removed from the cluster without rebooting?

A. Yes

Q. What types of disks can be used for cluster disks?

A. Microsoft recommends that all partitions on the cluster disks be formatted with the NTFS file system. This is for two reasons. Firstly, NTFS provides access controls that can be used to secure the data on the disk. Secondly, NTFS can recover from volumes that have been forcibly dismounted; other file systems may become corrupt if they are forcibly dismounted.

Server clusters only support Master Boot Record (MBR) format disks. Cluster disks cannot be GPT format.

Q. Can tape devices or other non-disk devices be attached to the same storage bus as the Cluster Disks?

A. It depends on the storage interconnect. Server clusters use SCSI reserve and reset to arbitrate for cluster disks. In Windows NT and Windows 2000 cluster server performs an untargeted bus reset. In Windows Server 2003 it is possible to target the reset, however, it may fallback to an untargeted reset. If a tape device receives a reset, this will typically trigger the tape to rewind.

Server cluster does not provide any arbitration mechanism for tape devices that are visible from multiple servers, therefore tape devices are not protected against concurrent access from multiple servers.

Microsoft does NOT support attaching a tape device to the SCSI bus containing the cluster disks in the case of a SCSI cluster or the loop in and fibre channel arbitrated loop.

Tape devices can be attached to a switched fabric as long as they are not visible through the same adapters as the cluster disks. This can be achieved by either putting the tape drive in a different zone to the cluster disks or by LUN masking techniques.

Q. Are software fault tolerant disks (software RAID or mirroring) supported in a Server cluster?

A. Windows server products shipped from Microsoft do not provide support for software RAID or mirroring, however, there are 3rd party products that provide this functionality in a clustered environment.

Q. Is the Virtual Snapshot Service (VSS) supported in a Server cluster?

A. Yes, the Virtual Snapshot Service is new with Windows Server 2003 and provides basic snapshot capabilities that are used by backup applications to create consistent, single point in time backups. The cluster service has a VSS provider that allows the cluster service configuration to be snapshoted and stored as part of the system state by these backup applications.

Q. Do Timewarp snapshots work in a Server cluster?

A. No, Timewarp is a new feature in Windows Server 2003 that allows persistent snapshot to be created and exposed to clients. TImewarp makes use of features that are not cluster-aware and it is not supported in a cluster at this time.

Q. Are hardware snapshots or business recovery volumes supported in a Server cluster?

A. Yes, you can use facilities in the latest storage controllers to create snapshots of existing volumes. Note, however, when you create a snapshot of a disk you should NOT expose the snapshot back to the same cluster as the original disk. The cluster service uses the disk signature to uniquely identify a disk. With a snapshot, the disk and the snapshot have the same disk signature.

If you create a hardware snapshot or a business recovery volume of a cluster disk you should expose the snapshot to another server or cluster (typically a dedicated backup server).

Q. What other considerations are there when creating clustered disks?

A. Modern storage controllers provide a virtual view of the storage itself. A physical RAID set can be carved into multiple logical units that are exposed to the operating system as individual disks or LUNs. If you intend to carve up physical disks in this way and expose them as independent LUNs to the hosts, you should think carefully about the IO characteristics and the failure characteristics remember underneath, there is only a finite bandwidth to each spindle.

Microsoft recommends that you do not create a LUN for use as the quorum disk from the same underlying physical disks that you will be using for applications. The availability of the cluster is directly related to the availability of the quorum disk. If I/Os to the quorum disk take too long, the cluster server will assume that the quorum disk has failed and initiate a failover of the quorum device. At that point, all other cluster related activity is suspended until the quorum device is brought back online.

Q. How can I replace a disk that has gone bad in a cluster?

A. The answer depends on the Windows release:

  • Windows NT Enterprise Edition

  • Windows 2000

    • DumpCfg.

    • ClusterRecovery Reskit tool provided on Windows Server 2003 Reskit

  • Windows Server 2003