Export (0) Print
Expand All

Provide cost-effective storage for Hyper-V workloads by using Windows Server: planning and design guide

Published: January 22, 2014

Updated: June 18, 2014

Applies To: System Center 2012 R2, Windows Server 2012 R2

This guide describes how to plan and design one particular storage solution for compute clusters that host virtual machines running on Windows Server and Hyper-V as part of a cloud service platform. This software-defined storage solution uses an easily-managed Windows Server file server cluster in conjunction with just-a-bunch-of-disks (JBOD) enclosures and Storage Spaces for high performance, cost-effective storage, obviating the need for expensive SAN devices when implementing a cloud platform.

noteNote
We’re still refining this solution, so check back in the coming weeks for updates.

For a list of recent changes to this topic, see the Change History section of this topic.

If you haven’t already, you should read the Provide cost-effective storage for Hyper-V workloads by using Windows Server – it provides an introduction to this solution and is meant to be used with this topic.

We assume that you want to target an initial deployment of roughly 100 tenants (with eight virtual machines per tenant), with the ability to expand the solution to roughly 500 tenants over time. You can deploy smaller and larger solutions with a similar design, but we don’t explicitly describe them here, so you’d have to do some work to adjust the scale.

Use the following steps and design decisions to plan for implementing Windows Server-based storage for Hyper-V workloads.

In this guide:

In this step, you design the file server cluster used to provide the storage to virtual machines in this solution.

Here are the hardware components we recommend for the file server clusters:

 

Component Guidelines

JBOD enclosures

  • Four identical JBOD enclosures that are certified for use with Storage Spaces (240 disks total in four JBODs)

    With four enclosures an entire enclosure can fail and storage spaces will remain online (assuming that there aren’t too many failed disks in the remaining enclosures).

    For more information, see Enclosure Awareness Support - Tolerating an Entire Enclosure Failing.

  • SAS-connected 60 disk JBODs that comply with Windows Certification requirements

    For a list of certified JBODs, see the Windows Server Catalog.

  • Each JBOD must be connected via two SAS connections through an HBA to all nodes of the file server clusters

    This maximizes performance and eliminates a single point of failure. To support this requirement, ideally each JBOD and server node would have twice the number of SAS ports as the number of nodes (8 ports on the JBOD and 8 ports on each node).

Physical disks

  • 48 7200 rpm HDDs per JBOD (192 HDDs total in four JBODs)

    7,200 rpm HDDs provide lots of capacity while consuming less power and costing less than higher rotational speed HDDs, but they still provide good performance in this solution when matched with a sufficient number of SSDs.

    When using 4 TB HDDs and 200 GB SSDs in four 60 bay JBODs, this solution provides about 724 TB of raw storage pool capacity per file server cluster. After resiliency and free space for rebuilding storage spaces is factored in, this yields roughly 225 TB of space for compute and management virtual machines.

  • 12 SSDs per JBOD (48 SSDs total in four JBODs)

    Storage Spaces uses SSDs to create a faster storage tier for frequently accessed data. It also uses SSDs for a persistent write back cache that reduces the latency of random writes.

    For more information, see What's New in Storage Spaces in Windows Server 2012 R2.

  • All disks must be dual-port SAS disks

    This enables each disk to be connected to all nodes of the failover cluster via SAS expanders included in the JBODs.

File server clusters

  • One four-node file server cluster

    With four nodes, all JBODs are connected to all nodes and you can maintain good performance even if two nodes fail, reducing the urgency of maintenance.

  • One file server cluster hosts the storage for one compute cluster

    If you add a compute cluster, also add another four-node file server cluster. You can add up to four file server clusters and four compute clusters per management cluster. The first file server cluster also hosts the storage for the management cluster.

    Additional clusters (also called scale units) let you increase the scale of your environment to support more virtual machines and tenants.

Cluster nodes

  • Two six-core CPUs

    The file server cluster doesn’t need the most powerful CPUs because most traffic is handled by RDMA network cards, which process network traffic directly.

  • 64 GB of RAM

    You don’t need a lot of RAM because the file server cluster uses storage tiers, which prevents the usage of a CSV cache (typically one of the largest consumers of RAM on a clustered file server).

  • Two HDDs set up in a RAID-1 (mirror) using a basic RAID controller

    This is where Windows Server is installed on each node. As an option, you can use one or two SSDs. SSDs cost more, but use less power and provide faster startup, setup, and recovery times as well as increased reliability. You can use a single SSD to reduce costs if you’re OK with reinstalling Windows Server on the node if the SSD fails.

Cluster node HBAs

  • Two identical 4 port 6 Gbps SAS HBAs

    Each of the HBAs has one connection to every JBOD so that there are two connections in total to every JBOD. This maximizes throughput and provides redundant paths, and can’t have built-in RAID functionality.

Cluster node network interface cards

  • One dual-port 10 gigabit Ethernet network interface card with RDMA support

    This card acts as the storage network interface between the file server cluster and the compute and management clusters, each of which store their virtual hard disk files on the file server cluster.

    The card requires RDMA support to maximize performance and iWARP if you want to use routers in-between racks of clusters, which can be relevant when adding additional compute and file server clusters to the solution. This card uses SMB 3 and SMB Direct to provide fault tolerance, with each port connected to a separate subnet.

    For a list of certified network interface cards with RDMA support, see the Windows Server Catalog.

  • One dual-port gigabit or 10 gigabit Ethernet network interface card without RDMA support

    This card communicates between the management cluster and the file server cluster, with each port connected to a separate subnet. It doesn’t need RDMA support because it communicates with the Hyper-V virtual switches on the management and compute clusters, which can’t use RDMA communication.

    For a list of certified network interface cards, see the Windows Server Catalog.

  • One gigabit Ethernet network interface for remote management

    This integrated lights-out (ILO), baseboard management controller (BMC), or onboard networking adapter connects to your management network.

Here are the software components we recommend for the file server clusters.

 

Technology Guidelines

Operating system

  • Windows Server 2012 R2 Standard with the Server Core installation option

    Using Windows Server 2012 R2 Standard saves money over using a more expensive edition, and the Server Core installation option keeps the security footprint low, which in turns limits the amount of software updates that you need to install on the file server cluster.

Failover Clustering

  • One Scale-Out File Server

    This clustered file server enables you to host continuously available file shares that are simultaneously accessible on multiple nodes.

MPIO

  • Enable Multipath I/O (MPIO) on each node

    This combines the multiple paths to physical disks in the JBODs, providing resiliency and load balancing across physical paths.

Storage pools

  • Three clustered storage pools per file server cluster

    The helps minimize the time required to fail over the storage pool to another node.

  • Four SSDs and 16 HDDs from each of the four JBODs, for a total of 80 disks per pool

    This provides enough SSDs to enable you to create the appropriate storage spaces, with the data distributed across the JBODs so that any JBOD can fail without resulting in downtime for your tenants (as long as there aren’t too many failed disks in the remaining JBODs).

  • No hot spare disks

    Instead keep the total free space in each of the storage pools equivalent to one HDD plus 8 GB (for storage pool and storage spaces overhead) and one SSD plus 8 GB per enclosure. This enables Storage Spaces to automatically rebuild storage spaces with up to two failed disks by copying data to multiple disks in the pool, drastically reducing the time it takes to recover from the failed disk when compared to using hot spares.

    In this solution with 4 TB HDDs and 200 GB SSDs, this means keeping 15.8 TB of free space per storage pools (47.4 TB total across the three storage pools).

    For more information, see What's New in Storage Spaces in Windows Server 2012 R2.

Storage spaces

  • Eight storage spaces per storage pool

    This distributes load across each node in the cluster (two storage spaces per node, per pool).

  • Use three-way mirror spaces

    Mirror spaces provide the best performance and data resiliency for hosting virtual machines. Three-way mirror spaces ensure that there are at least three copies of data, allowing any two disks to fail without data loss. We don’t recommend parity spaces for hosting virtual machines due to their performance characteristics.

  • Use the following settings to construct your three-way mirror spaces with storage tiers, the default write-back cache size, and enclosure awareness. We recommend four columns because this yields a high level of performance without requiring too many SSDs. You might have to adjust the tier sizes depending on your actual pool capacity, after reserving capacity to automatically rebuild failed storage spaces (see the storage pools guidelines above).

    For more information, see Storage Spaces Frequently Asked Questions.

     

    Setting Value

    ResiliencySettingName

    Mirror

    NumberOfDataCopies

    3

    NumberOfColumns

    4

    StorageTierSizes

    SSD: 91.2 GB; HDD: 9.3 TB (assuming 200 GB SSDs and 4 TB HDDs)

    IsEnclosureAware

    $true

  • All storage spaces use fixed provisioning

    Fixed provisioning enables you to use storage tiers and failover clustering, neither of which work with thin provisioning.

  • Create one additional 3 GB two-way mirror space without storage tiers

    This storage space is used as a witness disk for the file server cluster, and is used for file share witnesses for the management and compute clusters. This helps the file server cluster maintain its integrity (quorum) in the event of two failed nodes or network issues between nodes.

Partitions

  • One GPT partition per storage space

    This helps keep the solution simpler.

Volumes

  • One volume formatted with the NTFS file system per partition/storage space

    ReFS isn’t recommended for this solution in this release of Windows Server.

CSV

  • One CSV volume per volume (with one volume and partition per storage space)

    This enables the load to be distributed to all nodes in the file server cluster. Don’t create a CSV volume on the 3 GB storage space used to maintain cluster quorum.

BitLocker Drive Encryption

  • Test BitLocker Drive Encryption performance before using widely

    You can use BitLocker Drive Encryption to encrypt all data in storage on each CSV volume, improving physical security, but doing so can have a significant performance impact on the solution.

Continuously available file shares

  • One continuously available SMB file share per CSV volume/volume/partition/storage space

    This makes management simpler (one share per underlying storage space), and enables the load to be distributed to all nodes in the file server cluster.

  • Test the performance of encrypted data access (SMB 3 encryption) on file shares before deploying widely

    You can use SMB 3 encryption to help protect data on file shares that require protection from physical security breaches where an attacker has access to the datacenter network, but doing so eliminates most of the performance benefits of using RDMA network adapters.

Updates

  • Use Windows Server Update Services in conjunction with Virtual Machine Manager

    Create three to four computer groups in Windows Server Update Services (WSUS) for the file server nodes, adding one or two to each group. With this setup, you can update one server first and monitor its functionality, then update the rest of the servers one at a time so that load continues to be balanced across the remaining servers.

    For more information, see Managing Fabric Updates in VMM (or Deploy Windows Server Update Services in Your Organization if you’re not using Virtual Machine Manager).

  • Use Cluster-Aware Updating for UEFI and firmware updates

    Use Cluster-Aware Updating to update anything that can’t be distributed via WSUS. This probably means the BIOS (UEFI) for the cluster nodes along with the firmware for network adapters, SAS HBAs, drives, and the JBODs.

Data Protection Manager

  • You can use Data Protection Manager (DPM) to provide crash-consistent backups of the file server cluster. You can also use DPM and Hyper-V replication for disaster recovery of virtual machines on the compute cluster.

In this step, you design the management cluster that runs all of the management and infrastructure services for the file server and compute clusters.

noteNote
This solution assumes that you want to use the System Center suite of products, which provide powerful tools to streamline setting up, managing, and monitoring this solution. However, you can alternatively accomplish all tasks via Windows PowerShell and Server Manager (though you’ll probably find Windows PowerShell to be more appropriate due to scale of this solution). If you choose to forgo using System Center, you probably don’t need as powerful a management cluster as described here, and you might be able to use existing servers or clusters.

Here are the hardware components we recommend for the cluster that runs all of the management and infrastructure services for the file server and compute clusters.

 

Component Guidelines

Management cluster

  • One 4-node failover cluster

    Using four nodes provides the ability to tolerate one cluster node in the management cluster failing; use six nodes to be resilient to two nodes failing. One management cluster using Virtual Machine Manager can support up to 8,192 virtual machines.

Cluster nodes

  • Two eight-core CPUs

    The virtual machines on this cluster do a significant amount of processing, requiring a bit more CPU power than the file server cluster.

  • 128 GB of RAM

    Running the management virtual machines requires more RAM than is needed by the file server cluster.

  • Two HDDs set up in a RAID-1 (mirror) using a basic RAID controller

    This is where Windows Server is installed on each node. As an option, you can use one or two SSDs. SSDs cost more, but use less power and provide faster startup, setup, and recovery times as well as increased reliability. You can use a single SSD to reduce costs if you’re OK with reinstalling Windows Server on the node if the SSD fails.

Network interface cards

  • One dual-port 10 gigabit Ethernet network interface card with RDMA support

    This card communicates between the management cluster and the file server cluster for access to the .vhdx files used by the management virtual machines. The card requires RDMA support to maximize performance and iWARP if you want to use routers in-between racks of file server and management clusters, which can be relevant when adding additional file server clusters to the solution. This card uses SMB 3 and SMB Direct to provide fault tolerance, with each port connected to a separate subnet.

    For a list of certified network interface cards with RDMA support, see the Windows Server Catalog.

  • One dual-port gigabit or 10 gigabit Ethernet network interface card without RDMA support

    This card handles management traffic between all clusters. The card requires support for Virtual Machine Queue (VMQ), Dynamic VMQ, 802.1Q VLAN tagging, and GRE offload (NVGRE). The card uses NIC Teaming to make its two ports, each connected to a separate subnet, fault tolerant.

    The card can’t make use RDMA because RDMA requires direct access to the network card, and this card needs to communicate with Hyper-V virtual switches (which obscure direct access to the network card). It uses the NIC teaming technology for fault tolerance instead of SMB Direct so that protocols other than SMB can make use of the redundant network connections. You should use Quality of Service (QoS) rules to prioritize traffic on this connection.

    For a list of certified network interface cards with NVGRE support, see the Windows Server Catalog.

  • One gigabit Ethernet network interface for remote management

    This integrated lights-out (ILO), baseboard management controller (BMC), or onboard networking adapter connects to your management network.

The following list describes at a high level the software components we recommend for the management cluster:

  • Windows Server 2012 R2 Datacenter

  • Failover Clustering

  • Cluster-Aware Updating

  • Hyper-V

The following list describes at a high level the services that you should run in virtual machines on the management cluster:

  • Active Directory Domain Services (AD DS), DNS Server, and DHCP Server

  • Windows Server Update Services

  • Windows Deployment Services

  • Microsoft SQL Server

  • System Center Virtual Machine Manager

  • System Center Virtual Machine Manager Library Server

  • System Center Operations Manager

  • System Center Data Protection Manager

  • A management console (Windows Server with the GUI installation option)

  • Additional virtual machines are required depending on the services you’re using, such as Windows Azure Pack, and System Center Configuration Manager.

noteNote
Create identical virtual switches on all nodes so that each virtual machine can fail over to any node and maintain its connection to the network.

In this step, you design the compute cluster that runs the virtual machines that provide services to tenants.

Here are the hardware components we recommend for the compute clusters. These clusters house tenant virtual machines.

 

Component Guidelines

Hyper-V compute clusters

  • Each compute cluster contains 32 nodes and hosts up to 2,048 Hyper-V virtual machines. When you’re ready to add extra capacity, you can add up to three additional compute clusters (and associated file server clusters for a total of 128 nodes hosting 8,192 virtual machines for 512 tenants (assuming 8 VMs per tenant).

    See Hyper-V scalability in Windows Server 2012 for more information.

Cluster nodes

  • Two eight-core CPUs

    Two eight-core CPUs are sufficient for a general mix of workloads, but if you intend to run a lot of computation heavy workloads in your tenant virtual machines, select higher performance CPUs.

  • 128 GB of RAM

    Running the large number of virtual machines (probably 64 per node while all nodes of the cluster are running) requires more RAM than is needed by the file server cluster. Use more RAM if you want to provide more than 2 GB per virtual machine on average.

  • Two HDDs set up in a RAID-1 (mirror) using a basic RAID controller

    This is where Windows Server is installed on each node. As an option, you can use one or two SSDs. SSDs cost more, but use less power and provide faster startup, setup, and recovery times as well as increased reliability. You can use a single SSD to reduce costs if you’re OK with reinstalling Windows Server on the node if the SSD fails.

Network interface cards

  • One dual-port 10 gigabit Ethernet network interface card with RDMA support

    This card communicates with the file server cluster for access to the .vhdx files used by virtual machines. The card requires RDMA support to maximize performance and iWARP if you want to use routers in-between racks of file server and management clusters, which can be relevant when adding additional file server clusters to the solution. This card uses SMB 3 and SMB Direct to provide fault tolerance, with each port connected to a separate subnet.

    For a list of certified network interface cards with RDMA support, see the Windows Server Catalog.

  • One dual-port gigabit or 10 gigabit Ethernet network interface card without RDMA support

    This card handles management and tenant traffic. The card requires support for Virtual Machine Queue (VMQ), Dynamic VMQ, 802.1Q VLAN tagging, and GRE offload (NVGRE). The card uses NIC Teaming to make its two ports, each connected to a separate subnet, fault tolerant.

    The card can’t make use RDMA because RDMA requires direct access to the network card, and this card needs to communicate with Hyper-V virtual switches (which obscure direct access to the network card). It uses the NIC teaming technology for fault tolerance instead of SMB Direct so that protocols other than SMB can make use of the redundant network connections. You should use Quality of Service (QoS) rules to prioritize traffic on this connection.

    For a list of certified network interface cards with NVGRE support, see the Windows Server Catalog.

  • One gigabit Ethernet network interface for remote management

    This integrated lights-out (ILO), baseboard management controller (BMC), or onboard networking adapter connects to your management network and enables you to use System Center Virtual Machine Manager to set up the cluster node from bare-metal hardware. The interface must have support for Intelligent Platform Management Interface (IPMI) or Systems Management Architecture for Server Hardware (SMASH).

The following list describes at a high level the software components we recommend for the compute cluster:

  • Windows Server 2012 R2 Datacenter

  • Failover Clustering

  • Hyper-V

  • Data Center Bridging

  • Cluster-Aware Updating

After you have completed the planning steps, see What are the high-level steps to implement this solution?.

Change History

 

Date Description

June 18th, 2014

Updated guidance around how much free space to set aside in each pool for rebuilding storage spaces, and updated virtual disk sizes and other numbers accordingly

April 2nd, 2014

Removed Windows Catalog links to SAS disks and SAS HBAs because the links were confusing

January 22nd, 2014

Preliminary publication

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft