Provide cost-effective storage for Hyper-V workloads by using Windows Server: planning and design guide

 

Updated: July 15, 2015

Applies To: System Center 2012, Windows Server 2012 R2

This guide describes how to plan and design one particular storage solution for compute clusters that host virtual machines running on Windows Server and Hyper-V as part of a cloud service platform. This software-defined storage solution uses an easily-managed Windows Server file server cluster in conjunction with just-a-bunch-of-disks (JBOD) enclosures and Storage Spaces for high performance, cost-effective storage, obviating the need for expensive SAN devices when implementing a cloud platform.

For a list of recent changes to this topic, see the Change History section of this topic.

If you haven’t already, you should read the Provide cost-effective storage for Hyper-V workloads by using Windows Server – it provides an introduction to this solution and is meant to be used with this topic.

We assume that you want to target an initial deployment of roughly 100 tenants (with eight virtual machines per tenant), with the ability to expand the solution to roughly 500 tenants over time. For more flexible and comprehensive design guidance, see Software-Defined Storage Design Considerations Guide.

Use the following steps and design decisions to plan for implementing Windows Server-based storage for Hyper-V workloads.

In this guide:

  • Step 1: Design the file server cluster

  • Step 2: Design the management cluster

  • Step 3: Design the compute cluster

  • Next steps

Step 1: Design the file server cluster

In this step, you design the file server cluster used to provide the storage to virtual machines in this solution.

1.1. Design the file server cluster hardware

Here are the hardware components we recommend for the file server clusters. Note that we recommend purchasing all production hardware from a vendor that tests and supports the hardware as an integrated solution with Storage Spaces.

Component

Guidelines

Storage enclosures

  • Four identical storage enclosures (240 disks total in four enclosures)

    With four enclosures an entire enclosure can fail and storage spaces will remain online (assuming that there aren’t too many failed disks in the remaining enclosures).

  • SAS-connected 60-disk storage enclosures

  • Each storage enclosure must be connected via two SAS connections through a Host Bus Adapter (HBA) to all nodes of the file server clusters

    This maximizes performance and eliminates a single point of failure. To support this requirement, ideally each storage enclosure and server node would have twice the number of SAS ports as the number of nodes (8 ports on the enclosure and 8 ports on each node).

Physical disks

  • 48 7200 rpm HDDs per storage enclosure (192 HDDs total in four enclosures)

    7,200 rpm HDDs provide lots of capacity while consuming less power and costing less than higher rotational speed HDDs, but they still provide good performance in this solution when matched with a sufficient number of SSDs.

    When using 4 TB HDDs and 800 GB SSDs in four 60-bay enclosures, this solution provides about 804 TB of raw storage pool capacity per file server cluster. After resiliency, storage for backups, and free space for repairing storage spaces is factored in, this yields roughly 164 TiB of space for compute and management virtual machines (TiB is a terabyte calculated using binary - base 2 - notation instead of decimal - base 10 - notation).

  • 12 SSDs per storage enclosure (48 SSDs total in four storage enclosures)

    Storage Spaces uses SSDs to create a faster storage tier for frequently accessed data. It also uses SSDs for a persistent write back cache that reduces the latency of random writes.

    For more information, see What's New in Storage Spaces in Windows Server.

  • All disks must be dual-port SAS disks

    This enables each disk to be connected to all nodes of the failover cluster via SAS expanders included in the storage enclosures.

File server clusters

  • One four-node file server cluster

    With four nodes, all storage enclosures are connected to all nodes and you can maintain good performance even if two nodes fail, reducing the urgency of maintenance.

  • One file server cluster hosts the storage for one compute cluster

    If you add a compute cluster, also add another four-node file server cluster. You can add up to four file server clusters and four compute clusters per management cluster. The first file server cluster also hosts the storage for the management cluster.

    Additional clusters (also called scale units) let you increase the scale of your environment to support more virtual machines and tenants.

Cluster nodes

  • Two six-core CPUs

    The file server cluster doesn’t need the most powerful CPUs because most traffic is handled by RDMA network cards, which process network traffic directly.

  • 64 GB of RAM

    You don’t need a lot of RAM because the file server cluster uses storage tiers, which prevents the usage of a CSV cache (typically one of the largest consumers of RAM on a clustered file server).

  • Two HDDs set up in a RAID-1 (mirror) using a basic RAID controller

    This is where Windows Server is installed on each node. As an option, you can use one or two SSDs. SSDs cost more, but use less power and provide faster startup, setup, and recovery times as well as increased reliability. You can use a single SSD to reduce costs if you’re OK with reinstalling Windows Server on the node if the SSD fails.

Cluster node HBAs

  • Two identical 4 port 6 Gbps SAS HBAs

    Each of the HBAs has one connection to every storage enclosure so that there are two connections in total to every storage enclosure. This maximizes throughput and provides redundant paths, and can’t have built-in RAID functionality.

Cluster node network interface cards

  • One dual-port 10 gigabit Ethernet network interface card with RDMA support

    This card acts as the storage network interface between the file server cluster and the compute and management clusters, each of which store their virtual hard disk files on the file server cluster.

    The card requires RDMA support to maximize performance and iWARP if you want to use routers in-between racks of clusters, which can be relevant when adding additional compute and file server clusters to the solution. This card uses SMB 3 and SMB Direct to provide fault tolerance, with each port connected to a separate subnet.

    For a list of certified network interface cards with RDMA support, see the Windows Server Catalog.

  • One dual-port gigabit or 10 gigabit Ethernet network interface card without RDMA support

    This card communicates between the management cluster and the file server cluster, with each port connected to a separate subnet. It doesn’t need RDMA support because it communicates with the Hyper-V virtual switches on the management and compute clusters, which can’t use RDMA communication.

    For a list of certified network interface cards, see the Windows Server Catalog.

  • One gigabit Ethernet network interface for remote management

    This integrated lights-out (ILO), baseboard management controller (BMC), or onboard networking adapter connects to your management network.

1.2. Design the file server cluster software configuration

Here are the software components we recommend for the file server clusters.

Technology

Guidelines

Operating system

  • Windows Server 2012 R2 Standard with the Server Core installation option

    Using Windows Server 2012 R2 Standard saves money over using a more expensive edition, and the Server Core installation option keeps the security footprint low, which in turns limits the amount of software updates that you need to install on the file server cluster.

Failover Clustering

  • One Scale-Out File Server

    This clustered file server enables you to host continuously available file shares that are simultaneously accessible on multiple nodes.

MPIO

  • Enable Multipath I/O (MPIO) on each node

    This combines the multiple paths to physical disks in the storage enclosures, providing resiliency and load balancing across physical paths.

Storage pools

  • Three clustered storage pools per file server cluster

    This helps minimize the time required to fail over the storage pool to another node.

  • 5 SSDs and 16 HDDs from each of the four storage enclosures per workload pool, for a total of 84 disks per pool for your primary workloads.

    This provides enough SSDs to enable you to create the appropriate storage spaces, with the data distributed across the storage enclosures so that any storage enclosure can fail without resulting in downtime for your tenants (as long as there aren’t too many failed disks in the remaining storage enclosures).

  • 2 SSDs and 16 HDDs from each of the four storage enclosures for a backup pool, with a total of 72 disks in this pool.

    The SSDs in the backup pool are designated as journal disks to enhance the write performance of the virtual disks, which use the dual-parity resiliency type.

  • No hot spare disks

    Instead always keep at least 21.9 TiB of free HDD space in each of the storage pools, plus and 1.5 TiB of free SSD space in each of the workload pools. This enables Storage Spaces to automatically rebuild storage spaces with up to one failed SSD and 3 failed HDDs by copying data to multiple disks in the pool, drastically reducing the time it takes to recover from the failed disk when compared to using hot spares.

    In this solution with 4 TB HDDs and 800 GB SSDs, this means keeping 23.4 TB of free space per workload pool.

    For more information on how we come up with these numbers, see Software-Defined Storage Design Considerations Guide and the Software-Defined Storage Design Calculator.

Storage spaces

  • Eight storage spaces per workload storage pool

    This distributes load across each node in the cluster (two storage spaces per node, per pool).

  • Use three-way mirror spaces for workload data

    Mirror spaces provide the best performance and data resiliency for hosting virtual machines. Three-way mirror spaces ensure that there are at least three copies of data, allowing any two disks to fail without data loss. We don’t recommend parity spaces for hosting virtual machines due to their performance characteristics.

  • Use the following settings to construct your three-way mirror spaces with storage tiers, the default write-back cache size, and enclosure awareness. We recommend four columns for this configuration for a balance of high throughput and low latency.

    For more information, see Software-Defined Storage Design Considerations Guide.

    Setting

    Value

    ResiliencySettingName

    Mirror

    NumberOfDataCopies

    3

    NumberOfColumns

    4

    StorageTierSizes

    SSD: .54 TiB; HDD: 8.79 TiB (assuming 800 GB SSDs and 4 TB HDDs)

    IsEnclosureAware

    $true

  • All storage spaces use fixed provisioning

    Fixed provisioning enables you to use storage tiers and failover clustering, neither of which work with thin provisioning.

  • Create one additional 4 GB two-way mirror space without storage tiers

    This storage space is used as a witness disk for the file server cluster, and is used for file share witnesses for the management and compute clusters. This helps the file server cluster maintain its integrity (quorum) in the event of two failed nodes or network issues between nodes.

  • For your backup pool, use the following settings to create 16 virtual disks using the dual-parity resiliency type and 7 columns.

    Setting

    Value

    ResiliencySettingName

    Parity

    NumberOfDataCopies

    3

    Size

    7.53 TiB

    NumberOfColumns

    7

    IsEnclosureAware

    $true

Partitions

  • One GPT partition per storage space

    This helps keep the solution simpler.

Volumes

  • One volume formatted with the NTFS file system per partition/storage space

    ReFS isn’t recommended for this solution in this release of Windows Server.

  • Enable Data Deduplication on the virtual disks used for storing backups.

CSV

  • One CSV volume per volume (with one volume and partition per storage space)

    This enables the load to be distributed to all nodes in the file server cluster. Don’t create a CSV volume on the 4 GB storage space used to maintain cluster quorum.

BitLocker Drive Encryption

  • Test BitLocker Drive Encryption performance before using widely

    You can use BitLocker Drive Encryption to encrypt all data in storage on each CSV volume, improving physical security, but doing so can have a significant performance impact on the solution.

Continuously available file shares

  • One continuously available SMB file share per CSV volume/volume/partition/storage space

    This makes management simpler (one share per underlying storage space), and enables the load to be distributed to all nodes in the file server cluster.

  • Test the performance of encrypted data access (SMB 3 encryption) on file shares before deploying widely

    You can use SMB 3 encryption to help protect data on file shares that require protection from physical security breaches where an attacker has access to the datacenter network, but doing so eliminates most of the performance benefits of using RDMA network adapters.

Updates

  • Use Windows Server Update Services in conjunction with Virtual Machine Manager

    Create three to four computer groups in Windows Server Update Services (WSUS) for the file server nodes, adding one or two to each group. With this setup, you can update one server first and monitor its functionality, then update the rest of the servers one at a time so that load continues to be balanced across the remaining servers.

    For more information, see Managing Fabric Updates in VMM (or Deploy Windows Server Update Services in Your Organization if you’re not using Virtual Machine Manager).

  • Use Cluster-Aware Updating for UEFI and firmware updates

    Use Cluster-Aware Updating to update anything that can’t be distributed via WSUS. This probably means the BIOS (UEFI) for the cluster nodes along with the firmware for network adapters, SAS HBAs, drives, and the storage enclosures.

Data Protection Manager

  • You can use Data Protection Manager (DPM) to provide crash-consistent backups of the file server cluster. You can also use DPM and Hyper-V replication for disaster recovery of virtual machines on the compute cluster.

Step 2: Design the management cluster

In this step, you design the management cluster that runs all of the management and infrastructure services for the file server and compute clusters.

Note

This solution assumes that you want to use the System Center suite of products, which provide powerful tools to streamline setting up, managing, and monitoring this solution. However, you can alternatively accomplish all tasks via Windows PowerShell and Server Manager (though you’ll probably find Windows PowerShell to be more appropriate due to scale of this solution). If you choose to forgo using System Center, you probably don’t need as powerful a management cluster as described here, and you might be able to use existing servers or clusters.

2.1. Design the management cluster hardware

Here are the hardware components we recommend for the cluster that runs all of the management and infrastructure services for the file server and compute clusters.

Component

Guidelines

Management cluster

  • One 4-node failover cluster

    Using four nodes provides the ability to tolerate one cluster node in the management cluster failing; use six nodes to be resilient to two nodes failing. One management cluster using Virtual Machine Manager can support up to 8,192 virtual machines.

Cluster nodes

  • Two eight-core CPUs

    The virtual machines on this cluster do a significant amount of processing, requiring a bit more CPU power than the file server cluster.

  • 128 GB of RAM

    Running the management virtual machines requires more RAM than is needed by the file server cluster.

  • Two HDDs set up in a RAID-1 (mirror) using a basic RAID controller

    This is where Windows Server is installed on each node. As an option, you can use one or two SSDs. SSDs cost more, but use less power and provide faster startup, setup, and recovery times as well as increased reliability. You can use a single SSD to reduce costs if you’re OK with reinstalling Windows Server on the node if the SSD fails.

Network interface cards

  • One dual-port 10 gigabit Ethernet network interface card with RDMA support

    This card communicates between the management cluster and the file server cluster for access to the .vhdx files used by the management virtual machines. The card requires RDMA support to maximize performance and iWARP if you want to use routers in-between racks of file server and management clusters, which can be relevant when adding additional file server clusters to the solution. This card uses SMB 3 and SMB Direct to provide fault tolerance, with each port connected to a separate subnet.

    For a list of certified network interface cards with RDMA support, see the Windows Server Catalog.

  • One dual-port gigabit or 10 gigabit Ethernet network interface card without RDMA support

    This card handles management traffic between all clusters. The card requires support for Virtual Machine Queue (VMQ), Dynamic VMQ, 802.1Q VLAN tagging, and GRE offload (NVGRE). The card uses NIC Teaming to make its two ports, each connected to a separate subnet, fault tolerant.

    The card can’t make use RDMA because RDMA requires direct access to the network card, and this card needs to communicate with Hyper-V virtual switches (which obscure direct access to the network card). It uses the NIC teaming technology for fault tolerance instead of SMB Direct so that protocols other than SMB can make use of the redundant network connections. You should use Quality of Service (QoS) rules to prioritize traffic on this connection.

    For a list of certified network interface cards with NVGRE support, see the Windows Server Catalog.

  • One gigabit Ethernet network interface for remote management

    This integrated lights-out (ILO), baseboard management controller (BMC), or onboard networking adapter connects to your management network.

2.2. Design the management cluster software configuration

The following list describes at a high level the software components we recommend for the management cluster:

  • Windows Server 2012 R2 Datacenter

  • Failover Clustering

  • Cluster-Aware Updating

  • Hyper-V

The following list describes at a high level the services that you should run in virtual machines on the management cluster:

  • Active Directory Domain Services (AD DS), DNS Server, and DHCP Server

  • Windows Server Update Services

  • Windows Deployment Services

  • Microsoft SQL Server

  • System Center Virtual Machine Manager

  • System Center Virtual Machine Manager Library Server

  • System Center Operations Manager

  • System Center Data Protection Manager

  • A management console (Windows Server with the GUI installation option)

  • Additional virtual machines are required depending on the services you’re using, such as Windows Azure Pack, and System Center Configuration Manager.

Note

Create identical virtual switches on all nodes so that each virtual machine can fail over to any node and maintain its connection to the network.

Step 3: Design the compute cluster

In this step, you design the compute cluster that runs the virtual machines that provide services to tenants.

2.1. Design the compute cluster hardware

Here are the hardware components we recommend for the compute clusters. These clusters house tenant virtual machines.

Component

Guidelines

Hyper-V compute clusters

  • Each compute cluster contains 32 nodes and hosts up to 2,048 Hyper-V virtual machines. When you’re ready to add extra capacity, you can add up to three additional compute clusters (and associated file server clusters for a total of 128 nodes hosting 8,192 virtual machines for 512 tenants (assuming 8 VMs per tenant).

    See Hyper-V scalability in Windows Server 2012 and Windows Server 2012 R2 for more information.

Cluster nodes

  • Two eight-core CPUs

    Two eight-core CPUs are sufficient for a general mix of workloads, but if you intend to run a lot of computation heavy workloads in your tenant virtual machines, select higher performance CPUs.

  • 128 GB of RAM

    Running the large number of virtual machines (probably 64 per node while all nodes of the cluster are running) requires more RAM than is needed by the file server cluster. Use more RAM if you want to provide more than 2 GB per virtual machine on average.

  • Two HDDs set up in a RAID-1 (mirror) using a basic RAID controller

    This is where Windows Server is installed on each node. As an option, you can use one or two SSDs. SSDs cost more, but use less power and provide faster startup, setup, and recovery times as well as increased reliability. You can use a single SSD to reduce costs if you’re OK with reinstalling Windows Server on the node if the SSD fails.

Network interface cards

  • One dual-port 10 gigabit Ethernet network interface card with RDMA support

    This card communicates with the file server cluster for access to the .vhdx files used by virtual machines. The card requires RDMA support to maximize performance and iWARP if you want to use routers in-between racks of file server and management clusters, which can be relevant when adding additional file server clusters to the solution. This card uses SMB 3 and SMB Direct to provide fault tolerance, with each port connected to a separate subnet.

    For a list of certified network interface cards with RDMA support, see the Windows Server Catalog.

  • One dual-port gigabit or 10 gigabit Ethernet network interface card without RDMA support

    This card handles management and tenant traffic. The card requires support for Virtual Machine Queue (VMQ), Dynamic VMQ, 802.1Q VLAN tagging, and GRE offload (NVGRE). The card uses NIC Teaming to make its two ports, each connected to a separate subnet, fault tolerant.

    The card can’t make use RDMA because RDMA requires direct access to the network card, and this card needs to communicate with Hyper-V virtual switches (which obscure direct access to the network card). It uses the NIC teaming technology for fault tolerance instead of SMB Direct so that protocols other than SMB can make use of the redundant network connections. You should use Quality of Service (QoS) rules to prioritize traffic on this connection.

    For a list of certified network interface cards with NVGRE support, see the Windows Server Catalog.

  • One gigabit Ethernet network interface for remote management

    This integrated lights-out (ILO), baseboard management controller (BMC), or onboard networking adapter connects to your management network and enables you to use System Center Virtual Machine Manager to set up the cluster node from bare-metal hardware. The interface must have support for Intelligent Platform Management Interface (IPMI) or Systems Management Architecture for Server Hardware (SMASH).

2.2. Design the compute cluster software configuration

The following list describes at a high level the software components we recommend for the compute cluster:

  • Windows Server 2012 R2 Datacenter

  • Failover Clustering

  • Hyper-V

  • Data Center Bridging

  • Cluster-Aware Updating

Next steps

After you have completed the planning steps, see What are the high-level steps to implement this solution?.

See also

Change History

Date

Description

July 15th, 2015

Updated guidance for virtual disk design, and added links to Software-Defined Storage Design Considerations Guide, which provides more detailed and up-to-date storage design information.

June 18th, 2014

Updated guidance around how much free space to set aside in each pool for rebuilding storage spaces, and updated virtual disk sizes and other numbers accordingly

April 2nd, 2014

Removed Windows Catalog links to SAS disks and SAS HBAs because the links were confusing

January 22nd, 2014

Preliminary publication