Storage Topologies

Article
10/08/2009

Applies To: Windows Server 2000, Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

There are two types of Storage I/O technologies supported in Server clusters: Parallel SCSI and Fibre Channel. With the release of Microsoft Windows Server 2003, support will be provided for SCSI interconnects and fibre channel arbitrated loops for two nodes only. For larger configurations (greater than two), you will need to use a switched fibre channel (fabric) environment.

Issues

Parallel SCSI

Only supported in Windows 2000 Advanced Server up to two-nodes

SCSI Adaptors and storage solutions need to be certified

SCSI cards that are hosting the interconnect should have different SCSI IDs normally (6 and 7). Ensure device access requirements are in-line with SCSI IDs and priorities

SCSI adaptor BIOS should be disabled

If devices are daisy-chained, ensure that both ends of the shared bus are terminated

Use physical terminating devices and do not use controller-based or device-based termination

SCSI hubs are not supported

Avoid the use of connector converters (e.g. 68-pin to 50-pin)

Avoid combining multiple device types (single ended and differential, etc.)

Fibre Channel

Fibre Channel Arbitrated Loops (FC-AL) supported up to two nodes

Fibre Channel Fabric (FC-SW) supported for all higher combinations

Components and configuration need to be in the Microsoft Hardware Compatibility List (HCL)

Multi-cluster environment

Fault tolerant drivers and components also need to be certified

Virtualization engines need to be certified.

The switch is the only component that is not currently certified by Microsoft and it is recommended that the end user get the appropriate interoperability guarantees from the switch vendor before implementing switch fabric topologies. In complicated topologies, where multiple switches are used and connected through ISLs, it is recommended that the customer work closely with Microsoft and the switch and storage vendors during the implementation phase to ensure that all of the components work well together.

Supported and Qualified Configurations

All server clusters must be qualified to be supported by Microsoft. A qualified configuration has undergone extensive testing using a hardware compatibility test, provided by Microsoft. All qualified solutions appear on the HCL, available at https://www.microsoft.com/whdc/hcl/default.mspx. Only cluster solutions listed on the HCL are supported by Microsoft. The complete cluster solution must be listed on the Cluster HCL list. The complete solution includes the servers, the storage adapters, the interconnect type, the storage controllers firmware and driver versions. All of the components must match exactly, including any software, driver or firmware versions, for the solution to be qualified. The HCL contains a set of qualified cluster components. A solution built from qualified components does NOT imply that the solution is qualified.

The cluster components lists have been a source of confusion in the past and Microsoft will be removing the cluster component lists (such as Cluster/RAID) from the HCL for Windows Server 2003.

Storage Interconnects

Ensure that all storage interconnects used in Server Clusters are in the HCL. This also applies for any additional software that is used to provide fault tolerant or load balancing features for adaptors and interconnects.

Multiple paths to the storage for high availability: This is a very common feature implemented by almost all storage vendors. This feature allows for users to implement multiple fabrics (normally two), and use them in fault tolerant or load balancing configurations. In the past, each vendor had their own implementations that varied greatly from one another and also required specific configuration/driver combinations. With the release of Windows Server 2003, Microsoft has developed and supplied vendors with a multi-path driver that will be used by them, in place of custom-made drivers. The driver will ship as part of the individual vendor product, but it is expected that all vendor products (that Microsoft will support) will incorporate this driver. If they insist on using their own driver, it will need to be certified and listed in the HCL.

Server Cluster and SANs

Storage Area Networks (SANs) are increasingly being used to host storage that is managed by server clustering. There are some specific requirements driven from the implementations of clusters and the fact that all storage on the SANs may not be owned by nodes in a cluster. Some issues that can be translated into best practices are:

Ensure that the SAN configurations are in the Microsoft HCL (multi-cluster section).

When configuring your storage the following must be implemented:

Zoning- Zoning allows users to sandbox the logical volumes that will be used by a cluster. Any interactions between nodes and storage volumes will be isolated to the zone and other members of the SAN will not be affected by the same. This feature can be implemented at the controller or switch level and it is important that users have this implemented before installing clustering

LUN masking- This feature allows users to express at the controller level, a specific relationship between a LUN and a host. In theory, no other host should be able to see that LUN or manipulate it in any way. However, various implementations differ in functionality and as such; one cannot assume that LUN masking will always work. Therefore, it cannot be used instead of zoning. One can combine zoning and masking however, to meet some specific configuration requirements.

Firmware and driver versions- Some vendors implement specific functionality in drivers and firmware and it is recommended that users pay close attention to what firmware/driver combinations are compatible with the installation they are running. This is valid not only when building a SAN and attaching host to it but also to the entire life span of the system (hosts and SAN components). Close attention should be paid to issues coming out of applying service packs or vendor specific patches and/or upgrades.

Hardware verses software zoning- Zoning can be implemented in hardware/firmware on controllers or on software on the hosts. It is recommended that controller-based zoning be used since this allows for uniform implementation of access policy that cannot be interrupted or compromised by node disruption or failure of the software component.

Hardware versus software LUN masking- Some vendors also offer software-based masking facilities. Please make sure that any such software that is closely attached to storage and is involved with the presentation of the storage to the operating system needs to be certified. For the same reasons mentioned above, this is also not considered to be a very good thing to do if one cannot guarantee the stability of the software component.

Boot from SAN- This is an increasingly demanded feature and will be supported by Microsoft in Windows Server 2003. Some factors to consider are :

Configurations require support from the Host Bus Adaptor (HBA) and storage vendors. The HBA driver needs to be a Storport driver. Storport drivers improve performance over SCSIPORT, both in terms of throughput and in terms of the system resources that are utilized. It also adds a manageability infrastructure for configuration and management of host-based RAID adapters and is a new implementation in Windows Server 2003. These features are required if the operating system is to successfully boot from a SAN, and more importantly, the vendors that have an implementation need to get them certified by Microsoft.

Such solutions have limited scaling capacities and any additional complexities such as storage replication, recovery mechanisms etc. need to be addressed by the hardware vendors.

Server clusters have enabled a feature that will allow the startup disk, pagefile disks and the cluster disks to be hosted on the same channel. There are other performance and operational implications that need to be considered before implementation.

Please also read KB article 305547 Support for Booting from a Storage Area Network (SAN) which discusses this feature.

Storage Configuration and Setup

If this is a fresh cluster installation, you need to ensure that you do not have applications running. When creating a cluster or adding nodes to a cluster, the Wizard enumerates all of the storage on the node and clusters all storage that is not somehow determined to be non-clusterable. Non-clusterable storage includes all drives on any storage bus that has any system file on it (boot, system, page file, crash dump, or hibernate file), LDM drives, and those drives that respond positively to the IOCTL IOCTL_SCSI_MINIPORT_NOT_QUORUM_CAPABLE.

If this is an existing cluster that is being upgraded, then all configuration settings are honored and rolling upgrade from Windows 2000 is a fully supported option.

Server Clusters and Fault-Tolerant Disks (RAID)

Creation of fault-tolerant volumes such as striped, mirrored, and RAID-5 volumes in Windows Server 2003 requires the volumes to be dynamic disks. Dynamic disks are not supported out-of-the-box in Server cluster, however Veritas has a product (Veritas Volume Manager) that allows for similar functionality and is supported. NTFS is supported as a format type. If disks are not configured as basic and formatted in NTFS, clustering will not recognize the disks and will not be able to manage them. Server clustering also does not support GPT (GUID Partition table) disks, which was introduced as part of the EFI initiative.

Clustering, however, supports hardware based fault-tolerant disks. This means that the configuration of physical disks into fault-tolerant sets (JBOD, RAID 5, RAID 0+1 etc.) is done at the controller level and the set can be made visible either as a whole or carved into smaller pieces (volumes) and made available to hosts. The hosts are completely unaware of the physical implementation and treat the volume as a disk. It is for this reason that we recommend that usage characteristics be taken into consideration before implementing FT sets. For example, if you think the logical volume will host data that is being read and updated constantly, you should perhaps implement the fault-tolerant set as a RAID 0+1 instead of a RAID 5. Another thumb rule to follow would be associating the logical volume size as closely as possible to the physical set size. So carving out fewer logical volumes generally helps contribute to performance.

Volume expansion is also a regular requirement since data growth is not often considered up-front. Cluster disks can be extended without rebooting if the controller supports dynamic LUN expansion. This feature allows for the physical expansion to be implemented without disruption and users can use tools (diskpart) provided by Microsoft to allow for the change to be seamlessly applied at the logical level as well. There are separate versions of diskpart for Windows 2000 and Windows Server 2003. The Windows 2000 version is available as a free download on the web and the Windows Server 2003 version is shipped on the distribution CD.

Handling Storage Cable Disconnects in a Cluster

This document has been written up to answer the various queries we get about Microsoft not supporting manual disconnection of storage cable as a valid test for testing failover mechanisms. Microsoft software fully supports the validity of the test but is not responsible for guaranteeing the results. The main reason for this is that the reactions of the concerned devices (HBAs, switches, Storage solutions etc.) are controlled by lower level device drivers that are written by third party vendors. Different vendors implement storage drivers in different ways and it is upto them to implement features that will allow for such tests to work. At the time of writing this document not all vendors have device drivers that support this test. You should check with your vendor (storage and adapter) to ensure that pulling the storage adapter cable is correctly handled by their device drivers. If the drivers are written to specification made available by Microsoft this issue will be handled cleanly and the tests should work.

This document is specific to Fibre Channel based storage and the issue of storage cables being disconnected or broken and has been created to clearly state the position of the cluster product team with regard to the same. One would see similar issues with SCSI bus implementations but since a majority of implementations these days are fibre channel based we will be scoping our discussion that type of storage network.

Essentially there are usually 2 sets of connections between servers and storage. Servers and storage solutions are connected to switches/hubs. As such there can be 2 places disconnects can occur, one between the server (HBA) and the switch, and the other between the storage solution and the switch. And since there are many cluster and non cluster components involved when such an event happens we will try and map the events to the components thus establishing clear areas of responsibilities.

To begin with Server clusters product team fully understand the fact that storage cable disconnects leading to disruption in host connectivity is a valid scenario and can happen at customer sites. The product (Server clusters), fully supports handling such events gracefully and failing over resources (disk and related). However there are set of events that need to happen before Server clusters can detect the event and failover resources. There is also an additional set of events that need to happen for the resources to fail back to the original node. These set of events depend on the behavioral characteristics of certain components (lower level device drivers) that are not part of the server cluster product. Unfortunately there is no standardization of the required characteristics of these drivers and as such the behavioral semantics might differ from configuration to configuration. But the basic behavior linked to a cable disconnect (that of a failure being detected causing the disks to fail over) should always work.

We will state some of the requirements and issues in this document so as to give the reader an understanding on how all this works together and what questions to ask of whom to ensure that their systems have this functionality.

Disconnecting storage cables (HBA to switch)

We will first address the issue of a storage cable between the server (HBA and the switch) being disconnected.

Here is a rough diagram of what the storage stack will look like on a server running Windows and Server clusters. The stack could include scsiport drivers supplied by Microsoft and miniport drivers supplied by the HBA vendor or a full port driver that complete functionality and replaces the scsiport drive:

484d9ab5-57bc-4298-915a-fbccc5089c8a

Case 1

d17b7942-ce88-4f5f-8f62-1a3daa86680f

Case 2

In the first case when the cable is disconnected, if the HBA miniport drivers report the correct status (BusChangeDetected - This notification indicates that a target device might have been added or removed from a dynamic bus.) the scsiport drivers can do the right thing and the non availability (or availability) of disks is efficiently detected. If a device was reported as being not available PnP will do a scan and tear down all device objects that are not available) and the server cluster disk driver will get the required notifications. This will lead to the cluster service failing over the disk resources to another valid member of the cluster who has access to the devices in question.

If the HBA miniport driver instead reports a generic status (ResetDetected - all this indicates is that the HBA has detected a reset on the SCSI bus and after this notification, the HBA miniport driver is still responsible for completing any active requests). In such a case the cluster disk driver will eventually detect the fact that the disk is no longer available in the course of the monitoring tests (LooksAlive, IsAlive1) it perfumes on a periodic basis. This will cause to detect that the disks are no longer accessible leading it to fail over the disks to another member in the cluster that has valid access to the disks in question. The minimum time required to detect this is approximately three seconds when a reserve issued by clusdisk (the cluster disk driver) fails. The maximum time required is not quantifiable since that will depend on the behavioral and operational characteristics of the lower level device drivers. But the non availability of the disk will always be eventually detected causing the disks to fail over to a valid cluster node.

The frequency for running LooksAlive and IsAlive is tunable, meaning one can increase the frequency of the checks so has to have a smaller detection window but this has other (resource utilization) implications that needs to be carefully considered, especially in enterprise environments where server clusters might need to monitor a large number of resources.

The same is also valid if the out-of-the-box scsiport drivers are replaced by a vendor specific scsiport driver or of the SCSIport /Miniport combination is replaced by a monolithic Fullport driver as shown in Case 2. Such drivers are written by vendors to cater to specific requirements their storage solutions might have. Microsoft cannot guarantee the functionality of full port drivers Microsoft support will not be able to debug related issues if full port drivers are used.

Disconnecting storage cables (Switch to storage):

Here again the behavior depends on the implementations. Upon cable disconnect or device removal the HBA gets an RSCN from the switch and should notify the operating system of any changes it detects. Detection is a very complex operation and has a lot of external dependencies and may not always work correctly. In the worst case scenario the HBA may fail the IO or return BUSY. In such cases as well the worst case scenario applies which means that the LooksAlive/IsAlive checks will fail and the cluster service will wither failover the disks or fail them completely.

Reconnecting Storage Cables (HBA to switch):

Reconnecting the storage cables and expecting the resources to fail back is also a complicated issue. If the HBA miniport driver or the full port driver sends the right status then PnP will rescan the bus create device objects and allocate resources (such as I/O ports, memory addresses, and interrupts) for that device. These resources are supplied back to the miniport, which must then use these resources. In such cases reconnecting the storage cable is detected and is all that needs to be done to allow for the resources to fail back to the original node.

If the miniport driver or the full port driver does not implement the above then again fail back should just work because the physical device objects exist. If there are issues the user will need to initiate a manual scan (by going into Disk Management MMC and initiating a rescan of disks). Once this is done a failback should work

Reconnecting Storage Cables (Switch to storage controller):

Getting information (on cable reconnects) back from the switch is dependant on various conditions (have devices changed, has the name server detected this etc.) and does not always work as expected. A manual rescan should also work in such cases.

In either case the full functionality is provided by the miniport or the full port drivers (depending on what is being used) both of which are written by the storage or HBA vendor.

The above is also valid for fault tolerant solutions that implement features like multiple paths from servers to storage solutions. The behavior should be the same in event of both paths on a host failing. Single failure instances should be masked by the multi path driver and should not be visible to higher level subsystems.

To sum it up disconnecting storage cables will cause disks managed by server clusters to fail over to another member of a cluster no matter what driver types are used. Failing the disks back to the original node however is a feature that can only be guaranteed by HBA vendors for the simple fact that they own the drivers that incorporates the functionality that is primarily responsible for correctly handling this situation.

Adding new disks to a cluster- Adding new disks to a cluster does not require a rebooting nodes or restarting the cluster service post Windows 2000 SP3 and in Windows Server 2003. Any node can scan and mount the new disk to begin with and once it has been formatted to NTFS all that is required is the creation of a new physical disk resource using Cluster

The above is valid post Windows 2000 SP3 (2 hotfixes) and Windows Server 2003. The hotfixes are supplied by Microsoft Support.