Virtualization Fabric Design Considerations Guide

 

Who is this guide intended for? Information technology (IT) professionals within medium to large organizations who are responsible for designing a virtualization fabric that supports many virtual machines. Through the remainder of this document, these individuals are referred to as fabric administrators. People who administer virtual machines hosted on the fabric are referred to as virtual machine administrators, but they are not a target audience for this document. Within your organization, you may have the responsibility of both roles.

How can this guide help you? You can use this guide to understand how to design a virtualization fabric that is able to host many virtual machines in your organization. In this document, the collection of servers and hypervisors, and the storage and networking hardware that are used to host the virtual machines within an organization is referred to as a virtualization fabric. The following graphic shows an example virtualization fabric.

Virtualization fabric

Figure SEQ Figure \* ARABIC 1: Example virtualization fabric

Note: Each diagram in this document exists on a separate tab of the Virtualization Fabric Design Considerations Diagrams document, which you can download by clicking the figure name in each table caption.

Although all virtualization fabrics contain servers for storage and hosting virtual machines, in addition to the networks that connect them, every organization’s virtualization fabric design will likely be different than the example illustrated in Figure 1 due to different requirements.

This guide details a series of steps and tasks that you can follow to assist you in designing a virtualization fabric that meets your organization’s unique requirements. Throughout the steps and tasks, the guide presents the relevant technologies and feature options available to you to meet functional and service quality (such as availability, scalability, performance, manageability, and security) level requirements.

Though this document can help you design a manageable virtualization fabric, it does not discuss design considerations and options for managing and operating the virtualization fabric with a product such as Microsoft System Center 2012 or System Center 2012 R2. For more information, see System Center 2012 in the TechNet library.

This guide helps you design a virtualization fabric by using Windows Server 2012 R2 and Windows Server 2012 and vendor-agnostic hardware. Some features discussed in the document are unique to Windows Server 2012 R2, and they are called out throughout the document.

Assumptions: You have some experience deploying Hyper-V, virtual machines, virtual networks, Windows Server file services, and Failover Clustering, and some experience deploying physical servers, storage, and network equipment.

Additional resources

Before designing a virtualization fabric, you may find the information in the following documents helpful:

Both of these documents provide foundational concepts that are observed across multiple virtualization fabric designs and can serve as a basis for any virtualization fabric design.

Feedback: To provide feedback about this document, send e-mail to virtua@microsoft.com.

All_Symbols_Cloud

Did you know that Microsoft Azure provides similar functionality in the cloud? Learn more about Microsoft Azure virtualization solutions.

Create a hybrid virtualization solution in Microsoft Azure:
- Move VM’s between Hyper-V and Microsoft Azure

Design considerations overview

The remainder of this document provides a set of steps and tasks that you can follow to design a virtualization fabric that best meets your requirements. The steps are presented in an ordered sequence. Design considerations you learn in later steps may require you to change decisions you made in earlier steps however, due to conflicts. Every attempt is made to alert you to potential design conflicts throughout the document though.

You will arrive at the design that best meets your requirements only after iterating through the steps as many times as necessary to incorporate all of the considerations within the document.

Step 1: Determine virtual machine resource requirements

Step 2: Plan for virtual machine configuration

Step 3: Plan for server virtualization host groups

Step 4: Plan for server virtualization hosts

Step 5: Plan for virtualization fabric architecture concepts

Step 6: Plan for initial capability characteristics

Step 1: Determine virtual machine resource requirements

The first step in designing a virtualization fabric is to determine the resource requirements of the virtual machines that the fabric will host. The fabric must include the physical hardware necessary to meet those requirements. The virtual machine resource requirements are dictated by the operating systems and applications that run within the virtual machines. For the remainder of this document, the combination of the operating system and applications that run within a virtual machine is referred to as a workload. The tasks in this step help you define the resource requirements for your workloads.

Tip: Rather than assessing the resource requirements of your existing workloads and then designing a virtualization fabric that is able to support each of them, you may decide to design a virtualization fabric that can meet the needs of most common workloads instead. Then separately address the workloads that have unique needs.

Examples of such virtualization fabrics are those offered by public cloud providers, such as Microsoft Azure (Azure). For more information, see Virtual Machine and Cloud Service Sizes for Azure.

Public cloud providers typically offer a selection of virtual machine configurations that meet the needs of most workloads. If you decide to take this approach, you can skip directly to Step 2: Plan for virtual machine configuration in this document. Additional benefits to using this approach are:

  • When you decide to migrate some of your on-premises virtual machines to a public cloud provider, if your on-premises virtual machine configuration types are similar to those of your public provider, migrating the virtual machines will be easier than if the configuration types are different.

  • It may allow you to more easily forecast capacity requirements and enable a self-service provisioning capability for your virtualization fabric. This means that virtual machine administrators within the organization can automatically self-provision new virtual machines without involvement from the fabric administrators.

Task 1: Determine workload resource requirements

Each workload has requirements for the following resources. The first thing you’ll want to do is answer the following questions listed for each of your workloads.

  • Processor: What processor speed or architecture (Intel or AMD) or number of processors are required?

  • Network: In gigabits per second (Gbps), what network bandwidth is required for inbound and outbound traffic? What’s the maximum amount of network latency the workload can tolerate to function properly?

  • Storage: How many gigabytes (GB) of storage do the application and operating system files of the workload require? How many GBs of storage does the workload require for its data? How many input/output operations per second (IOPS) does the workload require to its storage?

  • Memory: In gigabytes (GB), how much memory does the workload require? Is the workload non-uniform memory access (NUMA) aware?

In addition to understanding the previous resource requirements, it’s important to also determine:

  • Whether the resource requirements are minimum or recommended.

  • What are the peak and average requirement for each of the hardware requirements on an hourly, daily, weekly, monthly, or annual basis.

  • The number of minutes of downtime per month that are acceptable for the workload and the workload’s data. In determining this, factor in the following:

    • Does the workload run on only one virtual machine, or does it run on a collection of virtual machines acting as one, such as a collection of network load-balanced servers all running the same workload? If you are using a collection of servers, the expressed downtime should be clear about whether it applies to each server in the collection, all servers in the collection, or at the collection level.

    • Working and non-working hours. For example, if nobody will use the workload between the hours of 9:00 P.M. and 6:00 A.M., but it’s critical that it is available as much as possible between the hours of 6:00 A.M. and 9:00 P.M., with an acceptable amount of downtime per month of only ten minutes, this requirement should be specified.

  • The amount of data loss that is acceptable in the event of an unexpected failure of the virtual infrastructure. This is expressed in minutes because virtual infrastructure replication strategies are typically time-based. Although no data loss is often expressed as a requirement, consider that achieving it often comes at a premium price, and it might also come with lower performance.

  • Whether the workload files and/or its data must be encrypted on disk and whether its data must be encrypted between the virtual machines and its end users.

You have the following options available for determining the previous resource requirements.

Option

Advantages

Disadvantages

Manually assess and log resource utilization

Able to report on whatever you choose

Can require significant manual effort

Use the Microsoft Assessment and Planning Toolkit to automatically assess and log resource utilization

  • Creates a variety of resource utilization reports

  • Doesn’t require an agent to be installed on the workload

Reports may or may not provide all the data you require

Note: If you choose to determine your resource requirements manually, you can download Virtualization Fabric Design Considerations Guide Worksheets and enter the information in the Workload resource req. worksheet. This guide references specific worksheets in that document.

Task 2: Define workload characterizations

You can define any number of workload characterizations in your environment. The following examples were selected because each of them requires a different configuration of virtualization fabric components, which will be discussed further in later steps.

  • Stateless: Write no unique information to their local hard disk after they’re initially provisioned and assigned unique computer names and network addresses. They may however, write unique information to separate storage, such as a database. Stateless workloads are optimal for running on a virtualization fabric because a “master” image can be created for the virtual machine. This image can be easily copied and booted on the virtualization fabric to add scale to the workload or to quickly replace a virtual machine that becomes unavailable in the event of a virtualization host failure. An example of a stateless workload is a web server running a front-end web application.

  • Stateful: Write unique information to their local hard disk after they’re initially provisioned and assigned unique computer names and network addresses. They may also write unique information to separate storage, such as a database. Stateful workloads typically require more complex provisioning and scaling strategies than stateless workloads. High availability strategies for stateful workloads might require shared state with other virtual machines. An example of a stateful workload is the SQL Server Database Engine.

  • Shared stateful: Stateful workloads that require some shared state with other virtual machines. These workloads often use Failover Clustering in Windows Server to achieve high availability, which requires access to shared storage. An example of a shared stateful workload is Microsoft System Center – Virtual Machine Manager.

  • Other: Characterizes workloads that may not run at all, or not run optimally, on a virtualization fabric. Attributes of such workloads are that they require:

    • Access to physical peripherals. An example of such an application is a telephony workload that communicates with a telephony network adapter in a physical host.

    • Resource requirements much higher than most of your other workloads. An example is a real-time application that requires less than one millisecond latency between application tiers.

    These applications may or may not run on your virtualization fabric, or they may require very specific hardware or configuration that is not shared by most of your other workloads.

Note: You can define your workload characterizations in the Settings worksheet and then select the appropriate characterization for each workload in the Workload resource req. worksheet.

Step 2: Plan for virtual machine configuration

In this step, you’ll define the types of virtual machines you’ll need to meet the resource requirements and characterizations of the workloads you defined in Step 1.

Task 1: Define compute configuration

In this task, you’ll determine the amount of memory and processors that each virtual machine requires.

Task 1a: Define virtual machine generation type

Windows Server 2012 R2 introduced generation 2 virtual machines. Generation 2 virtual machines support hardware and virtualization features that are not supported in generation 1 virtual machines. It’s important to make the right decision for your requirements, because after a virtual machine has been created, its type cannot be changed.

A generation 2 virtual machine provides the following new functionality:

  • PXE boot by using a standard network adapter

  • Boot from a SCSI virtual hard disk

  • Boot from a SCSI virtual DVD

  • Secure Boot (enabled by default)

  • UEFI firmware support

Generation 2 virtual machines support the following guest operating systems:

  • Windows Server 2012 R2

  • Windows Server 2012

  • 64-bit versions of Windows 8.1

  • 64-bit versions of Windows 8

  • Specific versions of Linux. For a list of distribution and versions that support generation 2 virtual machines, see Linux Virtual Machines on Hyper-V.

The following table lists the advantages and disadvantages of generation 1 and generation 2 virtual machines.

Option

Advantages

Disadvantages

Generation 1

  • Supports all supported Hyper-V guest operating systems

  • Provides compatibility with Azure virtual machines

  • Supports previous versions of Hyper-V

No access to new virtual machine functionality

Generation 2

  • Supports new functionality

  • Provides slight improvement in virtual machine boot and guest installation times

  • Uses SCSI devices or a standard network adapter to boot a virtual machine

  • Prevents unauthorized firmware, operating systems, or UEFI drivers from running when Secure Boot is enabled

  • Limited support for guest operating systems

  • Not compatible with Azure virtual machines

  • No support for RemoteFX

  • No support for virtual floppy disk

Important: Linux generation 2 virtual machines do not support Secure Boot. When you create a virtual machine and you intend to install Linux, you must turn off Secure Boot in the virtual machine settings.

Additional information:

Generation 2 Virtual Machine Overview

Task 1b: Define memory

You should plan the size of your virtual machine memory as you typically do for server applications on a physical computer. It should reasonably handle the expected load at ordinary times and at peak times. Insufficient memory can significantly increase response times and CPU or I/O usage.

Static Memory or Dynamic Memory

Static memory is the amount of memory assigned to the virtual machine. It is always allocated when the virtual machine is started and it does not change when the virtual machine is running. All of the memory is assigned to the virtual machine during startup and memory that is not being used by the virtual machine is not available to other virtual machines. If there is not enough memory available on the host to allocate to the virtual machine when it is started, the virtual machine will not start.

Static memory is good for workloads that are memory intensive and for workloads that have their own memory management systems, such as SQL Server. These types of workloads will perform better with static memory.

Note: There is no setting to enable static memory. Static memory is enabled when the Dynamic Memory setting is not enabled.

Dynamic Memory allows you to better use the physical memory on a system by balancing the total physical memory across multiple virtual machines, allocating more memory to virtual machines that are busy, and removing memory for less-used virtual machines. This can lead to higher consolidation ratios, especially in dynamic environments such as in the Virtual Desktop Infrastructure (VDI) or web servers.

When using static memory, if a virtual machine is assigned 10 GB of memory and it is only using 3 GB, the remaining 7 GB of memory is not available for use by other virtual machines. When a virtual machine has Dynamic Memory enabled, the virtual machine only uses the amount of memory that is required, but not below the minimum RAM that is configured. This frees up more memory for other virtual machines.

The following table lists the advantages and disadvantages for static memory and Dynamic Memory.

Option

Advantages

Disadvantages

Static memory

  • Provides virtual machines with available configured memory at all times

  • Provides better performance

  • Can be used with virtual NUMA

  • Memory not being used by a virtual machine cannot be allocated to another virtual machine.

  • Virtual machines will not start if there is not enough memory available.

Dynamic Memory

  • Provides improved virtual machine density when running idle or low load workloads

  • Allows allocating memory that is not being used so that it can be used by other virtual machines

  • You can oversubscribe the configured memory.

  • Additional overhead is required to manage memory allocations.

  • Not compatible with virtual NUMA.

  • Not compatible with workloads that implement their own memory managers.

The following are the memory configuration settings:

  • Startup RAM: Specifies the amount of memory required to start the virtual machine. The value needs to be high enough to allow the guest operating system to start, but should be as low as possible to allow for optimal memory utilization and potentially higher consolidation ratios.

  • Minimum RAM: Specifies the minimum amount of memory that should be allocated to the virtual machine after the virtual machine has started. The value can be set as low as 32 MB to a maximum value equal to the Startup RAM value. This setting is only available when Dynamic Memory is enabled.

  • Maximum RAM: Specifies the maximum amount of memory that this virtual machine is allowed to use. The value can be set from as low as the value for Startup RAM to as high as 1 TB. However, a virtual machine can use only as much memory as the maximum amount supported by the guest operating system. For example, if you specify 64 GB for a virtual machine running a guest operating system that supports a maximum of 32 GB, the virtual machine cannot use more than 32 GB. This setting is only available when Dynamic Memory is enabled.

  • Memory weight: Provides Hyper-V with a way to determine how to distribute memory among virtual machines if there is not enough physical memory available in the host to give every virtual machine its requested amount of memory. Virtual machines with a higher memory weight take precedence over virtual machines with lower memory weights.

Notes:

  • Dynamic Memory and virtual NUMA features cannot be used at the same time. A virtual machine that has Dynamic Memory enabled effectively has only one virtual NUMA node, and no NUMA topology is presented to the virtual machine regardless of the virtual NUMA settings.

  • When installing or upgrading the operating system of a virtual machine, the amount of memory that is available to the virtual machine during the installation and upgrade process is the value specified as Startup RAM. Even if Dynamic Memory has been configured for the virtual machine, the virtual machine only uses the amount of memory that is configured in the Startup RAM setting. Ensure that the Startup RAM value meets the minimum memory requirements of the operating system during the installation or upgrade procedures.

  • Guest operating system running in the virtual machine must support Dynamic Memory.

  • Complicated database applications like SQL Server or Exchange Server implement their own memory managers. Consult with workload’s documentation to determine if the workload is compatible with Dynamic Memory.

Additional information: 

Dynamic Memory Overview

Task 1c: Define processor

The following configuration settings must be determined for configuring virtual machines:

  • Determine the number of processors required for each virtual machine. This will often be the same as the number of processors required by the workload. Hyper-V supports a maximum of 64 virtual processors per virtual machine.

  • Determine resource control for each virtual machine. Limits can be set to ensure that no virtual machine is able to monopolize the processor resources of the virtualization host.

  • Define a NUMA topology. For high-performance NUMA-aware workloads, you can specify the maximum number of processors, the memory amount allowed on a single virtual NUMA node, and the maximum number of nodes allowed on a single processor socket. For more information, read Hyper-V Virtual NUMA Overview.

Note: Virtual NUMA and Dynamic Memory cannot be used at the same. When you are trying to decide whether to use Dynamic Memory or NUMA, answer the following questions. If the answer to both is Yes, enable virtual NUMA and do not enable Dynamic Memory.

  1. Is the workload running in the virtual machine NUMA-aware?

  2. Will the virtual machine consume more resources, processors, or memory than are available on a single physical NUMA node?

Task 1d: Define supported operating systems

You need to confirm that the operating system required by your workload is supported as a guest operating system. Consider the following:

Note: Hyper-V includes a software package for supported guest operating systems that improves performance and integration between the physical computer and the virtual machine. This collection of services and software drivers is referred to as integration services. For the best performance, your virtual machines should be running the latest integration services.

Licensing

You need to ensure that the guest operating systems are properly licensed. Please review the vendor’s documentation for any specific licensing requirements when you are running a virtualized environment.

Automatic Virtual Machine Activation (AVMA) is a feature that was introduced in Windows Server 2012 R2. AVMA binds the virtual machine activation to the licensed virtualization server and activates the virtual machine when it starts up. This eliminates the need to enter licensing information and activate each virtual machine individually.

AVMA requires that the host is running Windows Server 2012 R2 Datacenter and that the guest virtual machine operating system is Windows Server 2012 R2 Datacenter, Windows Server 2012 R2 Standard, or Windows Server 2012 R2 Essentials.

Note: You need to configure AVMA on each host deployed in your virtualization fabric.

Additional information:

Automatic Virtual Machine Activation

Task 1e: Define virtual machine naming convention

Your existing computer naming strategy might indicate where the computer or server is physically located. Virtual machines can move from host to host, even to and from different datacenters, so the existing naming strategy might no longer be applicable. An update to the existing naming convention to indicate that the computer is running as a virtual machine can help locate where the virtual machine is running.

Task 2: Define network configuration

Each virtual machine will receive or send different types of network traffic. Each type of network traffic will have different performance, availability, and security requirements.

Generation 1 virtual machines can have a maximum of 12 network adapters—4 legacy network adapters and 8 virtual network adapters. Generation 2 virtual machines do not support legacy network adapters, so the maximum number of adapters that is supported is 8.

Task 2a: Determine network traffic types

Each virtual machine will send and receive different types of data, such as:

  • Application data

  • Data backup

  • Communications with client computers, servers, or services

  • Intracluster communication, if the workload is part of a guest virtual machine failover cluster

  • Support

  • Storage

If you already have existing networks that are dedicated to different types of network traffic, you may choose to use those for this task. If you’re defining new network designs to support your virtualization fabric, for each virtual machine, you can define which types of network traffic it will support.

Task 2b: Define network traffic performance options

Each network traffic type has maximum bandwidth and minimum latency requirements. The following table shows the strategies that can be used to meet different network performance requirements.

Strategy

Advantages

Disadvantages

Separation of traffic types to different physical network adapters

Separates traffic so it is not being shared by other traffic types

  • Separate physical network adapters must be installed on the host for each network traffic type.

  • Additional hardware is required for each network that requires network high availability.

  • Does not scale well with a large number of networks.

Hyper-V bandwidth management (Hyper-V QoS)

  • Provides QoS for virtual network traffic

  • Enforce minimum bandwidth and maximum bandwidth for a traffic flow, which is identified by a Hyper-V Virtual Switch port number.

  • Configure minimum bandwidth and maximum bandwidth per Hyper-V virtual switch port by using either PowerShell cmdlets or Windows Management Instrumentation (WMI).

  • Configure multiple virtual network adapters in Hyper-V and specify QoS on each virtual network adapter individually.

  • Provides a supplement to QoS policy for the physical network.

  • Software QoS and hardware QoS should not be used at the same time on the same network adapter.

  • You need to properly plan QoS policy for the network and Hyper-V so they do not override each other.

  • When you set the mode for quality of service for a virtual switch, it cannot be changed.

  • You cannot migrate virtual machines to a host with a virtual switch that is set to use a different QoS mode.

  • Migration will be blocked when absolute values that configured for a virtual machine cannot be honored.

SR-IOV

  • Provides the lowest network latency for a virtual machine

  • Provides the highest network I/O for a virtual machine

  • Reduces the CPU overhead required for virtual networking

  • You need an SR-IOV-capable network adapter and driver in the host and each virtual machine where a virtual function is assigned.

  • SR-IOV enabled virtual network adapters cannot be part of the NIC Team on the host.

  • For network high availability, two or more SR-IOV network adapters need to be installed on the host, and NIC Teaming needs to be configured in the virtual machine.

  • SR-IOV should only be used by trusted workloads because the traffic bypasses the Hyper-V switch and has direct access to the physical network adapter.

  • Configuring virtual switch port ACLs, Hyper-V QoS, RouterGuard, and DHCPGuard will prevent SR-IOV from being used.

  • SR-IOV is not supported for virtual machines running in Azure.

Virtual receive-side scaling

  • Supports virtual receive-side scaling, which allows virtual machines to distribute network processing load across multiple virtual processors (vCPUs) to increase network throughput within virtual machines

  • Provides compatibility with:

    • NIC Teaming

    • Live migration

    • NVGRE

  • Virtual receive-side scaling requires the physical network adapter to support Virtual Machine Queue (VMQ), and it must be enabled on the host.

  • Not compatible with an SR-IOV-enabled virtual network adapter.

  • Virtual machines must be running Windows Server 2012 R2 or Windows 8.1.

  • Disabled by default if the VMQ adapter is less than 10 Gbps.

Jumbo frames

  • Allows more data to be transferred with each Ethernet transaction, reducing the number of frames that need to be transmitted

  • Used typically for communication with storage, but can be used for all types of communication

  • Reduces the overhead on the virtual machines, networking equipment, and the end server the data is being sent to

  • Configured for communication within a datacenter where you can control the maximum transmission unit (MTU) settings at all hops

  • Provides a slightly lower error detection probability.

  • Each network device along the path needs to support Jumbo Frames and be configured with the same or higher MTU setting. Use the Ping command to verify end-to-end MTU settings.

  • If one hop along the way does not support Jumbo Frames or is configured with a smaller MTU, the packets will be dropped.

Task 2c: Define network traffic availability options

NIC Teaming, also known as load balancing and failover (LBFO), allows multiple network adapters to be placed in a team for the purposes of bandwidth aggregation and traffic failover. This maintains connectivity in the event of a network component failure. NIC Teaming is typically configured on the host, and when you create the virtual switch, it is bound to network adapter team.

The network switches that are deployed determine the NIC Teaming mode. The default settings in Windows Server 2012 R2 should be sufficient for the majority of deployments.

Note: SR-IOV is not compatible with NIC Teaming. For more information about SR-IOV, see Task 2b: Define network traffic performance options.

Additional information:

NIC Teaming Overview

Task 2d: Define network traffic security options

Each network traffic type can have different security requirements, for example, requirements related to isolation and encryption. The following table explains strategies that can be used to meet various security requirements.

Strategy

Advantages

Disadvantages

Separation on different network adapters

Separate traffic from other network traffic

Does not scale well. The more networks you have, the more network adapters you need to install and manage on the host.

IPsec with IPsec Task Offloading

  • Supports IPsec offloading for encryption of network traffic to and from virtual machines using Hyper-V

  • Encrypts traffic while it is traversing the network

  • Setup is complex

  • Can make troubleshooting issues more difficult because traffic to and from the hosts and virtual machines cannot be opened

  • Increased processor utilization when physical network adapters in the host do not support IPsec offloading

VLAN tagging

  • Used by most companies already

  • Compatible with QoS policies

  • Supports Private VLANs

  • Supports VLAN trunk mode for virtual machines

  • Reduces the number of physical adapters that need to be installed in the host

  • Limited to 4094 VLANs

  • Configuration is required for switches, hosts, and virtual machines

  • Incorrect changes to VLAN configuration settings can lead to server-specific or system-wide network issues

Hyper-V Network Virtualization

  • Provides flexible workload placement, including network isolation and IP address reuse without VLANs

  • Enables easier movement of workloads to the cloud

  • Supports live migration across subnets without the need to inject a new IP address on the new server

  • Enables multi-tenant networking solutions

  • Provides simplified network design and improved server and network resource use. The rigidity of VLANs with the dependency of virtual machine placement on a physical network infrastructure typically results in overprovisioning and under use.

  • Management of Hyper-V Network Virtualization requires System Center 2012 R2 - Virtual Machine Manager or a non-Microsoft management solution.

  • A Hyper-V Network Virtualization gateway is required to allow communication outside of the virtual network.

DHCPGuard

  • Blocks the virtual machine from making DHCP offers over the virtual network

  • Configured on a per virtual network adapter basis

  • Does not stop the virtual machine from receiving an address from a DHCP server

Minimal impact on performance when enabled

RouterGuard

  • Blocks the following packets:

    • ICMPv4 Type 5 (redirect message)

    • ICMPv4 Type 9 (router advertisement)

    • ICMPv6 Type 134 (router advertisement)

    • ICMPv6 Type 137 (redirect message)

  • Configured on a per virtual network adapter basis

Minimal impact on performance when enabled

Design decision - You can download Virtualization Fabric Design Considerations Guide Worksheets and change the sample data in the Virtual machine configs. worksheet to capture the decisions you make for all previous tasks in this step. For subsequent design decisions, this document references specific worksheets in this guide where you can enter your data.

Task 2e: Define virtual network adapters

With an understanding of the types of traffic required by the virtual machines, in addition to the performance, availability, and security strategies for the traffic, you can determine how many virtual network adapters each virtual machine will require.

A virtual network adapter is connected to a virtual switch. There are three types of virtual switches:

  • External virtual switch

  • Internal virtual switch

  • Private virtual switch

The external virtual switch provides the virtual machine with access to the physical network through the network adapter that is associated with the virtual switch it is connect to. A physical network adapter in the host can only be associated with a single external switch.

Generation 1 virtual machines can have a maximum of 12 network adapters—4 legacy network adapters and 8 virtual network adapters. Generation 2 virtual machines do not support legacy network adapters, so the maximum adapters supported is 8. A virtual network adapter can have one VLAN ID assigned to it, unless it is configured in trunk mode.

If you are going to assign virtual machine traffic to different VLANs, a network adapter that supports VLANs must be installed in the host and assigned to the virtual switch. You can set the VLAN ID for the virtual machine in the properties of the virtual machine. The VLAN ID that is set in the virtual switch is the VLAN ID that will be assigned to the virtual network adapter assigned to the host operating system.

Note: If you have a virtual machine that requires access to more networks than available adapters, you can enable VLAN trunk mode for a virtual machine network adapter by using the Set-VMNetworkAdapterVlan Windows PowerShell cmdlet.

Task 2f: Define IP addressing strategy

You need to determine how you will assign IP addresses to your virtual machines. If you don't, you can have IP address conflicts, which can have a negative impact on other virtual machines and physical devices on the network.

Additionally, unauthorized DHCP servers can cause havoc on your network infrastructure, and they can be especially difficult to track down when the server is running as a virtual machine. You can protect your network against unauthorized DHCP servers running on a virtual machine by enabling DHCPGuard in the settings of your virtual machines. DHCPGuard protects against a malicious virtual machine representing itself as a DHCP server for man-in-the-middle attacks.

Additional information:

Dynamic Host Configuration Protocol (DHCP) Overview

DHCPGuard

IP Address Management (IPAM) Overview

Task 3: Define storage configuration

To determine your storage configuration, you need to define the data types that the virtual machines will store and the type of storage they need.

Task 3a: Define data types

The following table lists the types of data that a virtual machine may need to store and where that type of data is often stored.

Data type

Storage location for data type

Operating system files

Within a virtual hard disk file that is stored by the virtualization host. Storage considerations for the virtualization host are addressed further in Step 4: Plan for server virtualization hosts below.

Windows page file

Often stored in the same location as the operating system files.

Application program files

Often stored in the same location as the operating system files.

Application configuration data

Often stored in the same location as the operating system files.

Application data

Often stored separately from the application and operating system files. For example, if the application was a database application, the database files are often stored on a high availability, efficient, network-based, storage solution that is separate from the location where the operating system or application program files are stored.

Clustered Shared Volumes (CSV) and disk witness (required for guest virtual machine clustering)

Often stored separately from the application and operating system files.

  • CSV storage is where clustered applications store data so that it is available to all nodes in the cluster.

  • A disk witness is a disk in the cluster storage that is designated to hold a copy of the cluster configuration database. A failover cluster has a disk witness only if this is specified as part of the quorum configuration.

Crash dump files

Often stored in the same location as the operating system files.

Task 3b: Define storage types

The following table lists the types of storage that might be used for the data types defined in Step 2, Task 2a above.

Storage type

Considerations

Virtual IDE disk

Generation 1 virtual machines:

  • 2 IDE controllers, and each controller can support a maximum of 2 IDE devices for a maximum of 4 IDE devices.

  • The startup disk, also known as the boot disk, must be attached to one of the IDE devices as a virtual hard disk or as a physical disk.

Generation 2 virtual machines do not support IDE devices.

Virtual SCSI

  • 4 virtual SCSI controllers are supported with each controller supporting up to 64 devices for a total of 256 SCSI devices.

  • Because generation 2 virtual machines only support a SCSI drive, generation 2 virtual machines support SCSI boot disks.

iSCSI initiator in the virtual machine

  • Take advantage of storage on SANs without installing Fibre Channel adapters in the host.

  • Cannot be used for the boot disk.

  • Use network QoS policies to ensure proper bandwidth availability for storage and other network traffic.

  • Not compatible with Hyper-V Replica. When using a SAN storage backend, use the SAN replication options provided by your storage vendor.

Virtual Fibre Channel

  • Requires one or more Fibre Channel host bus adapters (HBAs) or Fibre Channel over Ethernet (FCoE) converged network adapters in each host that will host virtual machines with virtual Fibre Channel adapters.

  • HBA and FCoE drivers must support virtual Fibre Channel.

  • An NPIV-enabled SAN.

  • Requires additional configuration to support live migration. For additional information about live migration and virtual Fibre Channel, see Hyper-V Virtual Fibre Channel Overview.

  • Not compatible with Hyper-V Replica. When using SAN storage, you should use the SAN replication options provided by your storage vendor.

  • A virtual machine can have up to four virtual ports.

  • Virtual Fibre Channel LUNs cannot be used as boot media for the virtual machine.

SMB 3.0

Access files stored on Server Message Block (SMB) 3.0 shares from within the virtual machine.

Task 3c: Define virtual hard disk format and type

If you are using the virtual hard disk storage type, you must first select the VHD format that you’ll use from the options listed in the following table.

Disk format

Advantages

Disadvantages

VHD

  • Supported by all versions of Hyper-V

  • Supported by both on-premises implementations and Azure

  • Maximum storage capacity is 2040 GB

  • Maximum virtual hard disk supported by Azure is 1 TB

  • Not supported by generation 2 virtual machines

VHDX

  • Maximum storage capacity is 64 terabytes (TB)

  • Protection against data corruption during power failure

  • Improved alignment of the virtual hard disk format to work well on large sector disks

  • A 4-KB logical sector virtual disk that allows for increased performance when used by applications and workloads that are designed for 4 KB sectors

  • Can be used as shared storage for virtual machines that require Failover Clustering

  • Not currently supported by virtual machines in Azure

  • Cannot be used with versions of Hyper-V prior to Windows Server 2012

Shared VHDX

Used for shared storage for guest virtual machine clusters

  • Requires Windows Server 2012 R2 on the host running Hyper-V

  • Supported guest operating systems for guest clusters that use a shared virtual hard disk include Windows Server 2012 R2 and Windows Server 2012. To support Windows Server 2012 as a guest operating system, Windows Server 2012 R2 Integration Services must be installed within the guest (virtual machine).

  • The following features are not compatible with shared VHDX:

    • Hyper-V Replica

    • Resizing the virtual hard disk while any of the configured virtual machines are running

    • Live storage migration

    • Host-level VSS backups. Guest-level backups should be performed by using the same methods that you would use for a cluster running on physical servers.

    • Virtual machine checkpoints

    • Storage QoS

Next, select the type of disk you will use from the options listed in the following table.

Disk type

Advantages

Disadvantages

Fixed

  • Less likely to suffer from fragmentation than other disk types

  • Lower CPU overhead than other disk types

  • After the VHD file is created, there is less concern about available disk space than there is with the other disk types

  • Supported by both on-premises implementations and Azure

  • A created virtual hard disk requires all of the space to be available, even if the virtual machine is not using all of the space.

  • The virtual hard disk will fail to create if there is not enough storage space available.

  • Unused space in the virtual hard disk cannot be allocated to other virtual hard disks.

Dynamic

Only uses disk space as required, rather than using all that’s been provisioned

  • Not currently supported by Azure, though dynamic disks can be converted to fixed disks

  • It is important to monitor free disk space when using dynamic virtual hard disks. If disk space is not available for a dynamic virtual hard disk to grow, the virtual machine will enter a paused-critical state.

  • Virtual hard disk file can become fragmented

  • Slightly higher CPU overhead for read and write operations than there is for the fixed disk type

Differencing

Can use less disk space if multiple differencing disks use the same parent

  • Not currently supported by Azure

  • Changes made to a parent disk can cause data inconsistency in the child disk

  • Slightly higher CPU overhead for read and write operations for high I/O intensive workloads

Consider the following when you are selecting a virtual hard disk file type and format:

  • When you use the VHDX format, a dynamic disk can be used because it offers resiliency guarantees in addition to space savings that are associated with allocating space only when there is a need to do so.

  • A fixed disk can also be used, irrespective of the format, when the storage on the hosting volume is not actively monitored. This ensures that sufficient disk space is present when the VHD file is expanded at run time.

  • Checkpoints (formerly known as snapshots) of a virtual machine create a differencing virtual hard disk to store writes to the disks. Having only a few checkpoint s can elevate the CPU usage of storage I/O, but they might not noticeably affect performance (except in highly I/O-intensive server workloads).

    However, having a large chain of checkpoints can noticeably affect performance because reading from the virtual hard disks can require checking for the requested blocks in many differencing disks. Keeping short checkpoint chains is important for maintaining good disk I/O performance.

Task 3d: Define which storage type to use for each data type

After you define the data types and storage types that virtual machines will store, you can determine which storage type and which virtual disk format and type you’ll use for each data type.

Task 4: Define virtual machine availability strategy

Though fabric administrators are responsible for the availability of the fabric, virtual machine administrators are ultimately responsible for the availability of their virtual machines. As a result, the virtual machine administrator must understand the capabilities of the fabric to design the appropriate availability strategy for their virtual machines.

The following tables analyze three availability strategies for virtual machines running workloads with the characterizations that are defined in Step 1, Task 2 above. Typically, the fabric administrator informs virtual machine administrators in advance when planned downtime activities are scheduled for the fabric so that virtual machine administrators can plan accordingly. The three availability strategies are:

  • Stateless

  • Stateful

  • Shared stateful

Stateless

Option

Considerations

Virtual Machine Live Migration at the host level

  • If a host needs to be taken down for planned maintenance, the running virtual machines can be migrated to an operable host with no downtime to the virtual machines. For more information about host considerations, see Task 5: Define server virtualization host availability strategy below.

  • If the virtual machines are not stored on storage that is accessible by both hosts, you need to move the virtual machine storage during a live migration.

  • If a host fails unexpectedly, all virtual machines running on the host stop running. You need to start the virtual machines by running the same workload on another host.

Load-balanced clusters (by using Windows Network Load Balancing)

  • Requires that the virtual machine administrator have at least two virtual machines running an identical workload hosted on different hosts.

  • Network Load Balancing (NLB) is configured within the virtual machines by the virtual machine administrator.

  • NLB requires that static IP addresses are assigned to the network adapters. DHCP address assignment is not supported.

  • The virtual machine administrator needs to work with the fabric administrator to get IP addresses to use for the NLB virtual IP address and to create the required DNS entry.

  • Enable MAC spoofing for the virtual network that is used by NLB in the guests. This can be done from the Network Adapter settings on each virtual machine that is participating in a NLB cluster as a node. You can create NLB clusters, add nodes, and update NLB cluster configurations without rebooting the virtual machines.

  • All of the virtual machines that are participating in the NLB cluster must be on the same subnet.

  • To ensure availability of the workload (even in the case of host failure), the virtual machine fabric administrator needs to ensure that the virtual machines are running on different hosts.

Load-balanced clusters (by using a hardware load balancer)

  • Must provide this capability at the fabric level, and fabric administrators must configure load-balanced clusters for virtual machines that require it. Or they can enable virtual machine administrators to configure it through the management portal for the hardware load balancer.

  • Requires that the virtual machine administrator have at least two virtual machines running an identical workload hosted on the fabric.

  • Review the hardware vendor’s product documentation for additional information.

Stateful

Option

Considerations

Hyper-V Cluster

  • Requires configuration of a failover cluster.

  • Requires shared storage between all of the nodes in the cluster for the CSV files. This can be SAN storage or an SMB 3.0 file share.

  • When the cluster detects an issue with a host or Hyper-V detects an issue with the virtual machine networking or storage, the virtual machine can be moved to another host. The virtual machine continues to run during the move.

  • If there is a catastrophic failure of a host, the virtual machines that were running on that host can be started on other nodes in the cluster. Critical virtual machines can be configured to automatically start. This limits the amount of downtime if there is a catastrophic host failure.

  • Patch hosts without impacting running virtual machines with Cluster-Aware Updating.

  • Configure virtual machine anti-affinity to avoid running virtual machines on the same node. For example, if you are running two web servers that provide front-end services for a back-end application, you do not want both web servers running on the same node.

  • A node can be placed into maintenance mode and the failover cluster service will move the running virtual machines to another node in the cluster. When there are no running virtual machines on the node, the required maintenance can be performed.

    The failover cluster will not move virtual machines to a node in maintenance mode. Before putting a node in maintenance mode, ensure that there is enough capacity on the other nodes in the Hyper-V cluster to run the existing virtual machines and still maintain the SLAs for your customers.

Shared stateful

When running cluster-aware workloads, you can provide an additional layer of availability by enabling virtual machine guest clustering. Guest clustering supports high availability for workloads within the virtual machine. Guest clustering provides protection to the workload that is running in the virtual machines, even if a host fails where the virtual machine is running. Because the workload was protected by Failover Clustering, the virtual machine on the other node can take over automatically.

Option

Considerations

Virtual Machine Guest Clustering

  • Requires shared storage that is accessible by two or more virtual machines at the same time. Supported connection types include:

    • iSCSI

    • Virtual Fibre Channel

    • Shared VHDX

  • Configure virtual machine anti-affinity to avoid both virtual machines from running on the same cluster node.

  • Virtual machine guest clustering is not supported by Azure.

  • The following features are not compatible with shared VHDX:

    • Hyper-V Replica

    • Resizing the virtual hard disk while any of the configured virtual machines are running

    • Live storage migration

    • Host-level VSS backups. Guest-level backups should be performed by using the same methods that you would use for a cluster running on physical servers.

    • Virtual machine checkpoints

    • Storage QoS

Additional information:

Deploy a Guest Cluster Using a Shared Virtual Hard Disk

Using Guest Clustering for High Availability

Disaster Recovery

If there is a disaster, how quickly can you get the required workloads up and running so they can service clients? In some cases, the allotted time can be only a few minutes.

Replication of data from your main datacenters to your disaster recovery centers is required to ensure that the most up-to-date data can be replicated with an acceptable loss of data due to delays. By running workloads in virtual machines, you can replicate the virtual hard disks and the virtual machine configuration files from your primary site to a replica site.

The following table compares disaster recovery options.

Option

Considerations

Hyper-V Replica

  • Inexpensive, and there is no need to duplicate host and storage hardware at disaster recovery sites.

  • Use the same management tools to manage replication as to manage the virtual machines.

  • Configurable replication intervals to meet your data loss requirements.

  • Configure different IP addresses to be used at the replica site.

  • Minimal impact on network infrastructure.

  • Not supported for virtual machines configured with physical disks (also known as pass-through disks), virtual Fibre Channel storage, or shared virtual hard disks.

  • Hyper-V Replica should not be used as a replacement for data backup storage and data retrieval.

  • Additional storage will be required at the replica site if additional recovery points are configured.

  • Replication interval rate will determine the amount of data loss.

  • Additional storage is required at the replica site when a virtual machine with a large amount of changes is configured with a short replication interval.

Backup

  • Back up the complete virtual machine by using a Hyper-V supported backup solution, such as System Center Data Protection Manager.

  • Data loss will be determined by how old the last backup is.

  • Virtual machines configured with a shared VHDX file cannot be backed up at the host level. Install a backup agent in the virtual machine and back up the data from within the virtual machine.

Notes: 

  • To centrally manage and automate replication when running System Center 2012 R2 - Virtual Machine Manager you need to use Microsoft Azure Site Recovery.

  • To replicate virtual machines to Azure by using Microsoft Azure Site Recovery. Replicating a virtual machine to Azure is currently in preview mode.

Additional information:

Microsoft Azure Site Recovery

Important: 

  • Use the Hyper-V Replica Capacity Planner to understand the impact Hyper-V Replica will have on your network infrastructure; processor utilization on the primary, replica, and extended replica servers; memory usage on the primary and replica servers; and disk IOPS on the primary, replica, and extended replica servers that are based on your existing virtual machines.

  • Your workload might have a built-in disaster recovery solution, such as AlwaysOn Availability Groups in SQL Server. Consult with the workload documentation to confirm if Hyper-V Replica is supported by the workload.

Additional information:

Hyper-V Replica

System Center Data Protection Manager

Task 5: Define virtual machine types

To support the workloads in your environment, you might create virtual machines with unique resource requirements to meet the needs of every workload. Alternatively, you might take a similar approach to public providers of virtual machine hosting services (also referred to as Infrastructure-as-a-Service (IaaS).

See Virtual Machine and Cloud Service Sizes for Azure for a description of the virtual machine configurations offered by Microsoft Azure Infrastructure Services. As of this writing, the service supported 13 virtual machine configurations, each with different combinations of space for processor, memory, storage, and IOP.

Design decision - The decisions you make in all tasks of this step can be entered in the Virtual machine configs. worksheets.

Step 3: Plan for server virtualization host groups

Before you define individual server hosts, you may want to first define host groups. Host groups are simply a named collection of servers that are grouped together to meet the common goals that are outlined in the remaining tasks of this step.

Task 1: Define physical locations

You’ll likely group and manage hardware resources by physical location, so you’ll want to first define the locations that will contain fabric resources within your organization.

Task 2: Define host group types

You may create host groups for any number of reasons, such as to host workloads with specific:

  • Workload characterizations

  • Resource requirements

  • Service quality requirements

The following image illustrates an organization that has created five host groups in two locations.

Host group

Figure SEQ Figure \* ARABIC 2: Host group example

The organization created the host groups for the reasons outlined in the following table.

Host group

Reasons for creating it

Stateless and stateful workload

  • These workload characterizations are the most common in this organization, so they have this type of host group in both locations.

  • These workloads have similar performance and service-level requirements.

Accounting department stateful and stateless workloads

Though the hardware configuration of the servers in this host group are the same as other stateless and stateful workload host groups in their environment, the Accounting department has applications that have higher security requirements than other departments in the organization. As a result, a dedicated host group was created for them so it could be secured differently than the other host groups in the fabric.

Shared stateful workloads

The workloads hosted by this host group require shared storage because they rely on Failover Clustering in Windows Server to maintain their availability. These workloads are hosted by a dedicated host group because the configuration of these virtual machines is different than the other virtual machines in the organization.

High I/O stateful workloads

All the hosts in this host group are connected to higher speed networks than the hosts in the other host groups.

Though the organization could have spanned locations with their host groups, they chose to keep all members of each host group within the same location to ease their management. As you can see from this example, host groups can be created for a variety of reasons, and those reasons will vary across organizations. The more types of host groups you create in your organization, the more complex the environment will be to manage, which ultimately adds to the cost of hosting virtual machines.

Tip: The more standardized the server hardware is within a host group, the easier it will be to scale and maintain the host group over time. If you determine that you want to standardize the hardware within a host group, you can add the standardized configuration data to the Host groups worksheet in Virtualization Fabric Design Considerations Worksheets. For more information about physical hardware considerations, see Step 4: Plan for server virtualization hosts.

Consider that currently, most public cloud providers that host virtual machines:

  • Only host virtual machines that don’t require shared state.

  • Often only have one set of service quality metrics that they provide to all customers.

  • Do not dedicate specific hardware to specific customers.

We recommend that you start with one host group type that contains identical hardware, and only add additional host group types because the benefit of doing so outweighs the cost.

Task 3: Determine whether to cluster host group members

In the past, Failover Clustering in Windows Server was only used to increase server availability, but it has grown to provide significantly more functionality. Consider the information in the following table to help you decide whether you’ll want to cluster your host group members.

Option

Advantages

Disadvantages

Host group members are part of a failover cluster

  • If any host fails, the virtual machines it’s hosting automatically restart on surviving nodes.

  • Virtual machines can be moved to another node in the cluster when the node it is currently running on detects an issue with the node or in the virtual machine.

  • Use Cluster-Aware Updating to easily update nodes in the cluster without impacting running virtual machines.

  • Hosts require specific configuration to be cluster members.

  • Hosts must be members of an Active Directory domain.

  • Failover Clustering requires additional networking and storage requirements.

Host group members are not part of a failover cluster

  • Hosts do not require a specific cluster configuration.

  • Hosts do not have to be members of an Active Directory domain.

  • Additional networking and storage are not required.

Virtual machines running on a host that fails must be manually (or you can use some form of automation) moved to a surviving host and restarted.

Design decision - The decisions you make in all tasks of this step can be entered in the Settings worksheet.

Step 4: Plan for server virtualization hosts

In this step, you’ll define the types of hosts you’ll need to host the virtual machines you plan to run on your virtualization fabric. You will want limit the number of host configurations, in some cases to a single configuration, to ease procurement and support costs. Additionally, purchasing the wrong equipment will drive up the deployment costs.

Cloud Platform System

Microsoft brings its experience running some of the largest datacenters and cloud services into a factory-integrated and fully validated converged system. Cloud Platform System (CPS) combines Microsoft’s proven software stack of Windows Server 2012 R2, System Center 2012 R2, and Windows Azure Pack, with Dell’s cloud server, storage and networking hardware. As a scalable building block for your cloud, CPS shortens the time to value and enables a consistent cloud experience.

CPS provides a self-service, multi-tenant cloud environment for Platform-as-a-Service, Windows and Linux virtual machines, and includes optimized deployment packs for Microsoft applications like SQL Server, SharePoint, and Exchange. The factory integration decreases risk and complexity while accelerating deployment time from months to days. The simplified support process and automation of routine infrastructure tasks also frees up IT resources to focus on innovation.

For additional information, see the Cloud Platform System site.

Fast Track

Rather than designing your hardware (and software) configuration, you can purchase preconfigured hardware configurations from a variety of hardware partners through the Microsoft Private Cloud Fast Track program.

The Fast Track program is a joint effort between Microsoft and its hardware partners to deliver validated, preconfigured solutions that reduce the complexity and risk of implementing a virtualization fabric and the tools to manage it.

The Fast Track program provides flexibility of solutions and customer choice across hardware vendors’ technologies. It uses the core capabilities of the Windows Server operating system, Hyper-V technology, and Microsoft System Center to deliver the building blocks of a private cloud infrastructure as a service offering.

Additional information:

Microsoft Private Cloud Fast Track site

Task 1: Define compute configuration

In this task, you’ll determine the amount of memory, number of processors, and the version of Windows Server that are required for each host. The number of virtual machines to run on a host will be determined by the hardware components discussed in this section.

Note: To ensure that your solution is fully supported, all hardware that you purchase must carry the Certified for Windows Server logo for the version of Windows Server you are running.

The Certified for Windows Server logo demonstrates that a server system meets Microsoft’s highest technical bar for security, reliability and manageability. With other certified devices and drivers, it can support the roles, features, and interfaces for Cloud and Enterprise workloads and for business critical applications.

For the latest list of Certified for Windows Server hardware, see the Windows Server Catalog.

Task 1a: Define processor

Hyper-V presents the logical processors to each active virtual machine as one or more virtual processors. You can achieve additional run-time efficiency by using processors that support Second Level Address Translation (SLAT) technologies such as Extended Page Tables (EPTs) or Nested Page Tables (NPTs). Hyper-V in Windows Server 2012 R2 supports a maximum of 320 logical processors.

Considerations:

  • Workloads that are not processor intensive should be configured to use one virtual processor. Monitor host processor utilization over time to ensure that you’ve allocated processors for maximum effectiveness.

  • Workloads that are CPU intensive should be assigned two or more virtual processors. You can assign a maximum of 64 virtual processors to a virtual machine. The number of virtual processors recognized by the virtual machine is dependent on the guest operating system. For example, Windows Server 2008 with Service Pack 2 recognizes only four virtual processors.

Additional information:

Hyper-V Overview

Performance Tuning for Hyper-V Servers

Task 1b: Define memory

The physical server requires sufficient memory for the host and running virtual machines. The host requires memory to efficiently perform I/O on behalf of the virtual machines and operations such as a virtual machine checkpoint. Hyper-V ensures that sufficient memory is available to the host, and it allows remaining memory to be assigned to the virtual machines. Virtual machines should be sized based on the needs of the expected load for each virtual machine.

The hypervisor virtualizes the guest physical memory to isolate virtual machines from each other and to provide a contiguous, zero-based memory space for each guest operating system, the same as on non-virtualized systems. To ensure that you get maximum performance, use SLAT-based hardware to minimize the performance cost of memory virtualization.

Size your virtual machine memory as you typically do for server applications on a physical computer. The amount of memory assigned to the virtual machine should allow the virtual machine to reasonably handle the expected load at ordinary and peak times because insufficient memory can significantly increase response times and CPU or I/O usage.

Memory that has been allocated for a virtual machine reduces the amount of memory that is available to other virtual machines. If there is not enough available memory on the host, the virtual machine will not start.

Dynamic Memory enables you to attain higher consolidation numbers with improved reliability for restart operations. This can lead to lower costs, especially in environments that have many idle or low-load virtual machines, such as pooled VDI environments. Dynamic Memory run-time configuration changes can reduce downtime and provide increased agility to respond to requirement changes.

For more information about Dynamic Memory, see Task 1b: Define memory, which discusses how to determine memory for a virtual machine.

Additional information:

Dynamic Memory Overview

Virtual NUMA Overview

Task 1c: Define Windows Server operating system edition

The feature sets in Windows Server Standard and Windows Server Datacenter are exactly the same. Windows Server Datacenter provides an unlimited number of virtual machines. With Windows Server Standard, you are limited to two virtual machines.

In Windows Server 2012 R2, the Automatic Virtual Machine Activation (AVMA) feature was added. AVMA lets you install virtual machines on a properly activated server without having to manage product keys for each virtual machine, even in disconnected environments.

AVMA requires that the guest operating systems are running Windows Server 2012 R2 Datacenter, Windows Server 2012 R2 Standard, or Windows Server 2012 R2 Essentials. The following table compares the editions.

Edition

Advantages

Disadvantages

Standard

  • Includes all Windows Server features

  • Acceptable for non-virtualized or lightly virtualized environments

Limited to two virtual machines

Datacenter

  • Includes all Windows Server features

  • Allows unlimited virtual machines

  • Acceptable for highly-virtualized private cloud environments

More expensive

Hyper-V can be installed on a Server Core installation option of Windows Server. A Server Core installation reduces the space required on the disk, the potential attack surface, and especially the servicing requirements. A Server Core installation is managed by using the command line, Windows PowerShell, or by remote administration.

It is important to review the licensing terms of any software you are planning to use.

Microsoft Hyper-V Server

Microsoft Hyper-V Server provides a simple and reliable virtualization solution to help organizations improve their server utilization and reduce costs. It is a stand-alone product that contains only the Windows hypervisor, a Windows Server driver model, and virtualization components.

Hyper-V Server can fit into customers’ existing IT environments and leverage their existing provisioning, management processes, and support tools. It supports the same hardware compatibility list as the corresponding editions of Windows Server, and it integrates fully with Microsoft System Center and Windows technologies such as Windows Update, Active Directory, and Failover Clustering.

Hyper-V Server is a free download, and its installation is already-activated. However, every operating system that is running on a hosted virtual machine requires a proper license.

Additional information:

Automatic Virtual Machine Activation

Microsoft Hyper-V Server

Manage Hyper-V Server Remotely

Task 2: Define network configuration

In Step 2, Task 2 above, we discussed the design considerations for the virtual machine networking. Now we are going to discuss the networking consideration for the host. There are several types of network traffic that you must consider and plan for when you deploy Hyper-V. You should design your network configuration with the following goals in mind:

  • To ensure network QoS

  • To provide network redundancy

  • To isolate traffic to defined networks

Task 2a: Define network traffic types

When you deploy a Hyper-V cluster, you must plan for several types of network traffic. The following table summarizes the traffic types.

Traffic type

Description

Management

  • Provides connectivity between the server that is running Hyper-V and basic infrastructure functionality

  • Used to manage the Hyper-V host operating system and virtual machines

Cluster and CSVs

  • Used for internode cluster communication such as the cluster heartbeat and Cluster Shared Volumes (CSV) redirection

  • Only when Hyper-V has been deployed using Failover Clustering

Live migration

Used for virtual machine live migration and shared nothing live migration

Storage

Used for SMB traffic or for iSCSI traffic

Replica

Used for virtual machine replication traffic through the Hyper-V Replica feature

Virtual machine (tenant) traffic

  • Used for virtual machine connectivity

  • Typically requires external network connectivity to service client requests

Note: See Step 2: Plan for virtual machine configuration for a list of virtual machine traffic types.

Backup

Used to back up virtual hard disk files

Task 2b: Define network traffic performance options

Each network traffic type will have maximum and minimum bandwidth requirements and minimum latency requirements. Following are the strategies that can be used to meet different network performance requirements.

Policy-based QoS

When you deploy a Hyper-V cluster, you need a minimum of six traffic patterns or networks. Each network requires network redundancy. To start, you are talking about 12 network adapters in the host. It is possible to install multiple quad network adapters, but at some point you are going to run out of slots in your host.

Networking equipment is getting faster. Not so long ago, 1 GB network adapters were the top-of-the-line. 10 GB adapters in servers are becoming more common, and prices to support 10 GB infrastructures are becoming more reasonable.

Installing two 10 GB teamed network adapters provides more bandwidth than two quad 1 GB adapters, requires fewer switch ports, and simplifies your cabling needs. As you converge more of your network traffic types on the teamed 10 GB network adapters, policy-based QoS allows you to manage the network traffic to properly meet the need of your virtualization infrastructure.

Policy-based QoS enables you to specify network bandwidth control, based on application type, users, and computers. QoS policies allow you meet the service requirements of a workload or an application by measuring network bandwidth, detecting changing network conditions (such as congestion or availability of bandwidth), and prioritizing (or throttling) network traffic.

In addition to the ability to enforce maximum bandwidth, QoS policies in Windows Server 2012 R2 provide a new bandwidth management feature: minimum bandwidth. Unlike maximum bandwidth, which is a bandwidth cap, minimum bandwidth is a bandwidth floor, and it assigns a certain amount of bandwidth to a given type of traffic. You can simultaneously implement minimum and maximum bandwidth limits.

Advantages

Disadvantages

  • Managed by Group Policy

  • Easily applied to VLANs to provide proper bandwidth settings when multiple VLANs are running on the network adapter or using NIC Teaming

  • Policy-based QoS can be applied to IPsec traffic

  • Does not provide bandwidth management to traffic that is using a virtual switch

  • Hyper-V hosts must be domain joined

  • Software-based QoS policies and hardware-based QoS policies (DCB) should not be used at the same time

Additional information:

Quality of Server (QoS) Overview

Policy-based Quality of Service

Data Center Bridging

Data Center Bridging (DCB), provides hardware-based bandwidth allocation to a specific type of traffic and enhances Ethernet transport reliability with the use of priority-based flow control. DCB is recommended when using FCoE and iSCSI.

Advantages

Disadvantages

  • Support for Microsoft iSCSI

  • Support for FCoE

  • Hardware investments required, including:

    • DCB-capable Ethernet adapters

    • DCB-capable hardware switches

  • Complex to deploy and manage

  • Does not provide bandwidth management for virtual switch traffic

  • Software-based QoS policies and DCB policies should not be used at the same time

Additional information:

Data Center Bridging (DCB) Overview

SMB Direct

SMB Direct (SMB over remote direct memory access or RDMA) is a storage protocol in Windows Server 2012 R2. It enables direct memory-to-memory data transfers between the server and storage. It requires minimal CPU usage, and it uses standard RDMA-capable network adapters. This provides extremely fast responses to network requests, and as a result, this makes remote file storage response times on par with directly attached block storage.

Advantages

Disadvantages

  • Increased throughput: Leverages the full throughput of high speed networks where the network adapters coordinate the transfer of large amounts of data at line speed

  • Low latency: Provides extremely fast responses to network requests, and as a result, makes remote file storage seem like it is directly attached block storage

  • Low CPU utilization: Uses fewer CPU cycles when transferring data over the network, which frees up more CPU cycles for the virtual machines

  • Live migration can be configured to use SMB Direct for faster live migrations.

  • Enabled by default on the host

  • The SMB client automatically detects and uses multiple network connections if an appropriate configuration is identified

  • Configure SMB bandwidth management to set limits for live migration, virtual machines, and default storage traffic

  • SMB Multichannel does not require RDMA-supported adapters

  • RDMA-enabled network adapters are not compatible with NIC Teaming

  • Requires two or more RDMA network adapters to be deployed in each host to provide high availability

  • Currently limited to the following types of network adapters:

    • iWARP

    • Infiniband

    • RoCE

  • RDMA with RoCE requires DCB for flow control.

Receive Segment Coalescing

Receive segment coalescing (RSC) reduces CPU utilization for inbound network processing by offloading tasks from the CPU to an RSC-capable network adapter.

Advantages

Disadvantages

  • Improves the scalability of the servers by reducing the overhead for processing a large amount of inbound network traffic

  • Minimizes the CPU cycles that are spent for network storage and live migrations

  • Requires a RSC-capable network adapter

  • Does not provide significant improvement for send-intensive workloads

  • Not compatible with IPsec encrypted traffic

  • Applies to the host traffic. To apply RSC to virtual machine traffic, the virtual machine must be running Windows Server 2012 R2 and configured with a SR-IOV network adapter.

  • Not enabled by default on servers upgraded to Windows Server 2012 R2

Receive Side Scaling

Receive-side scaling (RSS) enables network adapters to distribute the kernel-mode network processing load across multiple processor cores in multiple core computers. The distribution of this processing makes it possible to support higher network traffic loads than would be possible if only a single core is used. RSS achieves this by spreading the network processing load across many processors and actively load balancing traffic that is terminated by the Transmission Control Protocol (TCP).

Advantages

Disadvantages

  • Spreads monitoring interruptions over multiple processors, so a single processor is not required to handle all I/O interruptions, which were common with earlier versions of Windows Server.

  • Works with NIC Teaming

  • Works with User Datagram Protocol (UDP) traffic

  • Requires a RSS-capable network adapter

  • Disabled if the virtual network adapter is bound to a virtual switch. VMQ is used instead of RSS for network adapters bound to a virtual switch.

SR-IOV

Hyper-V supports SR-IOV-capable network devices and allows the direct assignment of an SR-IOV virtual function of a physical network adapter to a virtual machine. This increases network throughput, reduces network latency, and reduces the host CPU overhead that is required for processing network traffic.

For additional information about SR-IOV, see Task 2b: Define network traffic performance options above.

Task 2c: Define network traffic high availability and bandwidth aggregation strategy

NIC Teaming, also known as load balancing and failover (LBFO), allows multiple network adapters to be placed into a team for the purposes of bandwidth aggregation and traffic failover. This helps maintain connectivity in the event of a network component failure.

This feature has been available from network adapter vendors. Introduced in Windows Server 2012, NIC Teaming is included as a feature in the Windows Server operating system.

NIC Teaming is compatible with all networking capabilities in Windows Server 2012 R2 with three exceptions:

  • SR-IOV

  • RDMA

  • 802.1X authentication

From a scalability perspective, in Windows Server 2012 R2, a minimum of 1 and a maximum of 32 network adapters can be added to a single team. An unlimited number of teams can be created on a single host.

Additional information:

NIC Teaming Overview

Microsoft Virtual Academy: NIC Teaming in Windows Server 2012

NIC Teaming (NetLBFO) Cmdlets in Windows PowerShell

Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management

Converged Data Center with File Server Storage

Task 2d: Define network traffic isolation and security strategy

Each network traffic type may have different security requirements for functions such as isolation and encryption. The following table lists the strategies that can be used to meet various security requirements.

Strategy

Advantages

Disadvantages

Encryption (IPsec)

Traffic is secured while traversing the wire

  • Performance impact to encrypt and decrypt traffic

  • Complex to configure, manage, and troubleshoot

  • Incorrect IPsec configuration changes can cause network disruptions or traffic to not be properly encrypted

Separate physical networks

Network is physically separated

  • Requires additional network adapters to be installed in the host

  • If network requires high availability, two or more network adapters are required for each network.

Virtual local area network (VLAN)

  • Isolates traffic by using an assigned VLAN ID

  • Support for VLAN Trunking Protocol

  • Support for private VLANs

  • Already used by many enterprise customers

  • Limited to 4094 VLANs, and most switches support only 1000 VLANs

  • Requires additional configuration and management of networking equipment

  • VLANs cannot span multiple Ethernet subnets, which limits the number of nodes in a single VLAN and restricts the placement of virtual machines, based on physical location.

Task 2e: Define virtual network adapters

With an understanding of the types of traffic required by the virtualization server hosts, and the performance, availability, and security strategies for the traffic, you can determine how many physical network adapters are required for each host and the types of network traffic that will be transmitted over each adapter.

Task 2f: Define virtual switches

To connect a virtual machine to a network, you need to connect the network adapter to a Hyper-V virtual switch.

There are three types of virtual switches that can be created in Hyper-V:

  • External virtual switch 

    Use an external virtual switch when you want to provide virtual machines with access to a physical network to communicate with externally located servers and clients. This type of virtual switch also allows virtual machines on the same host to communicate with each other. This type of network may also be available for use by the host operating system, depending on how you configure the networking.

    Important: A physical network adapter can only be bound to one virtual switch at a time.

  • Internal virtual switch

    Use an internal virtual switch when you want to allow communication between virtual machines on the same host and between virtual machines and the host operating system. This type of virtual switch is commonly used to build a test environment in which you need to connect to the virtual machines from the host operating system. An internal virtual switch is not bound to a physical network adapter. As a result, an internal virtual network is isolated from external network traffic.

  • Private virtual switch

    Use a private virtual switch when you want to allow communication only between virtual machines on the same host. A private virtual switch is not bound to a physical network adapter. A private virtual switch is isolated from all external network traffic on the virtualization server, and from any network traffic between the host operating system and the external network. This type of network is useful when you need to create an isolated networking environment, such as an isolated test domain.

    Note: Private and internal virtual switches do not benefit from hardware acceleration features that are available to a virtual machine that is connected to an external virtual switch

Design decision - The decisions you make in all the tasks of this step can be entered in the Virtualization hosts worksheets.

Tip: The name of virtual switches on different hosts that connect to the same network should have the same name. This eliminates confusion about which virtual switch a virtual machine should be connected to and it simplifies moving a virtual machine from one host to another. The Move-VM Windows PowerShell cmdlet will fail if the same virtual switch name is not found on the destination host.

Task 3: Define storage configuration

In addition to the storage required for the host operating system, each host requires access to storage where the virtual machine configuration files and virtual hard disks are stored. This task will focus on the virtual machine storage.

Task 3a: Define data types

The following are the sample data types you need to consider for your storage requirements.

Data type

Storage location of data type

Host operating system files

Typically on a local hard drive

Host page file and crash dumps in Windows

Typically on a local hard drive

Failover cluster shared state

Shared network storage or cluster shared volume

Virtual hard disk files and virtual machine configuration file

Typically on shared network storage or cluster shared volume

The remainder of this step is focused on the storage required for the virtual machines.

Task 3b: Storage options

The following options are available for storing the virtual machine configuration files and virtual hard disks.

Option1: Direct-attached storage

Direct-attached storage refers to a computer storage system that is directly attached to your server, instead of being attached directly to a network. Direct-attached storage is not limited to only internal storage. It can also use an external disk enclosure that contains hard disk drives, including just-a-bunch-of-disks (JBOD) enclosures and enclosures that are connected through SAS or another disk controller.

Advantages

Disadvantages

  • Does not require a storage network

  • Fast disk I/O, so there is no need for storage requests to travel over a network

  • Can be internal storage or an external disk enclosure, including JBODs

  • You can use JBOD with the Storage Spaces technology to combine all of your physical disks into a storage pool, and then create one or more virtual disks (called Storage Spaces) out of the free space in the pool.

  • JBOD are typically less expensive and often more flexible and easier to manage than RAID enclosures because they use the Windows or Windows Server operating systems to manage the storage instead of using dedicated RAID adapters.

  • Limited in the number of servers that can be attached to the external disk enclosure

  • Only external shared storage, such as shared SAS with Storage Spaces, provides support for Failover Clustering

Option 2: Network-attached storage

Network-attached storage devices connect storage to a network where they are accessed through file shares. Unlike direct-attached storage, they are not directly attached to the computer.

Network-attached storage devices support Ethernet connections, and they typically allow an administrator to manage disk space, set disk quotas, provide security, and use checkpoint technologies. Network-attached storage devices support multiple protocols. These include network-attached file systems, Common Internet File Systems (CIFS), and Server Message Block (SMB).

Advantages

Disadvantages

  • Simpler to set up than SAN storage, requiring less dedicated storage hardware

  • Plug and play

  • Can use existing Ethernet network

  • Network attached storage device must support SMB 3.0—CIFS is not supported

  • Not directly attached to the host servers that are accessing the storage

  • Slower than other options

  • Typically require a dedicated network for optimal performance

  • Limited management and functionality

  • Hyper-V supports network-attached storage devices that support SMB 3.0, SMB 2.0, and CIFS are not supported

  • May or may not support RDMA

Option 3: Storage area network

A storage area network (SAN) is a dedicated network that allows you to share storage. A SAN consists of a storage device, the interconnecting network infrastructure (switches, host bus adapters, and cabling), and servers that are connected to this network. SAN devices provide continuous and fast access to large amounts of data. The communication and data transfer mechanism for a given deployment is commonly known as a storage fabric.

A SAN uses a separate network, and it is generally not accessible by other devices through the local area network. A SAN can be managed by using Storage Management Initiative Specification (SMI-S), Simple Network Management Protocol (SNMP), or a proprietary management protocol.

A SAN does not provide file abstraction, only block-level operations. The most common SAN protocols used are iSCSI, Fiber Channel, and Fiber Channel over Ethernet (FCoE). An SMI-S or a proprietary management protocol can deliver additional capabilities, such as disk zoning, disk mapping, LUN masking, and fault management.

Advantages

Disadvantages

  • SAN uses a separate network, so there is limited impact on the data network

  • Provides continuous and fast access to large amounts of data

  • Typically provides additional features such as data protection and replication

  • Can be shared amongst various teams

  • Support for Virtual Fibre Channel for direct access to storage LUNs

  • Support for guest clustering

  • Virtual machines that need access to data volumes greater than 64 TB can use virtual Fibre Channel for direct LUN access

  • Expensive

  • Requires specialized skills to deploy, manage, and maintain

  • HBA or FCoE network adapters need to be installed in each host.

  • Migrating a Hyper-V cluster requires additional planning and limited downtime.

  • To provide bandwidth management for FCoE traffic, a hardware QoS policy that uses datacenter bridging is required.

  • FCoE traffic is not routable.

Option 4: Server Message Block 3.0 file shares

Hyper-V can store virtual machine files, such as configuration files, virtual hard disk files, and checkpoints, in file shares that use the Server Message Block (SMB) 3.0 protocol. The file shares will typically be on a scale-out file server to provide redundancy. When running a scale-out file server, if one nod is down, the file shares are still available from the other nodes in the scale-out file server.

Advantages

Disadvantages

  • Option to use existing networks and protocols

  • SMB Multichannel provides an aggregation of network bandwidth and fault tolerance when multiple paths are available between the server running Hyper-V and the SMB 3.0 file share.

  • You can use JBOD with the Storage Spaces technology to combine all of your physical disks into a storage pool, and then create one or more virtual disks (called Storage Spaces) out of the free space in the pool.

  • SMB Multichannel can be used for virtual machine migrations.

  • Less expensive than SAN deployments

  • Flexible storage configurations on the file server running Windows Server

  • Separate Hyper-V services from storage services, which enables you to scale each service as needed

  • Provides flexibility when upgrading to the next version when running a Hyper-V cluster. You can upgrade the servers running Hyper-V or the scale-out file servers in any order without downtime. You need enough capacity in the cluster to remove one or two nodes to perform the upgrade.

  • Scale-out file server provides support for shared VHDX

  • SMB bandwidth management allows you to set limits for live migration, virtual hard disk, and default storage traffic.

  • Support for SMB traffic encryption with minimal impact on performance

  • Save disk space with data deduplication for VDI deployments

  • Does not require specialized skills to deploy, manage, and maintain

  • I/O performance is not as fast as in SAN deployments.

  • Data Deduplication is not supported on running virtual machine files, except for VDI deployments.

SMB Direct

SMB Direct works as part of the SMB file shares. SMB Direct requires network adapters and switches that support RDMA to provide full speed with low latency storage access. SMB Direct enables remote file servers to resemble local and direct-attached storage. In addition to the benefits of SMB, SMB Direct has the following advantages and disadvantages.

Advantages

Disadvantages

  • Functions at full speed with low latency, while using very little CPU

  • Enables a scale-out file server to deliver storage performance and resiliency similar to a traditional SAN by using Microsoft storage solutions and inexpensive shared direct-attached storage

  • Provides the fastest option for live migrations and storage migrations

  • Not supported with NIC Teaming

  • Two or more RDMA-enabled network adapters are required for redundant connections to the storage.

Scale-out file server

Figure SEQ Figure \* ARABIC 3: Sample scale-out file server that uses converged networking with RDMA

Additional information:

Provide cost-effective storage for Hyper-V workloads by using Windows Server

Converged Data Center with File Server Storage

Deploy Hyper-V over SMB

Achieving over 1-Million IOPS from Hyper-V VMs in a Scale-Out File Server Cluster Using Windows Server 2012 R2

Task 3c: Define physical drive architecture types

The type of physical drive architecture that you select for your storage will impact the performance of your storage solution. For additional information about disk types, see Section 7.1 of Infrastructure-as-a-Service Product Line Architecture.

Task 3d: Define storage networking type

The storage controller or storage networking controller types that you use are determined by the storage option that you select for each host group. For more information, see Task 3b: Storage options.

Task 3e: Determine which storage type to use for each data type

With an understanding of your data types, you can now determine which storage option, storage controller, storage networking controller, and physical disk architectures best meet your requirements.

Design decision - The decisions you make in this task can be entered in the Virtualization hosts worksheet.

Additional information:

Networking configurations for Hyper-V over SMB in Windows Server 2012 and Windows Server 2012 R2

Windows Server 2012 Hyper-V Component Architecture Poster and Companion References

Storage Technologies Overview

Task 4: Define server virtualization host scale units

Purchasing individual servers requires procurement, installation, and configuration for each server. Scale units enable you to purchase collections of servers (that typically contain identical hardware).They are preconfigured, which enables you to add capacity to the datacenter by adding scale units, rather than by adding individual servers.

The following image illustrates a scale unit that could have been purchased preconfigured from any number of hardware vendors. It includes a rack, an uninterruptable power supply (UPS), a pair of redundant network switches for the servers contained within the rack, and ten servers.

Host scale unit

Figure SEQ Figure \* ARABIC 4: Example of a virtualization server host scale unit

The scale unit comes preconfigured and pre-cabled to the UPS and network switches. The unit simply needs to be added to a datacenter, plugged into electrical power, and connected to the network and storage. Then it is ready to be used. If the individual components were not purchased as a scale unit, the purchaser would need to rack and wire all of the components.

Design decision - If you decide to use server virtualization host scale units, you can define the hardware for your virtualization host scale units in the Host scale units worksheet.

Tip: You can purchase preconfigured scale units from a variety of Microsoft hardware partners through the Microsoft Private Cloud Fast Track program.

Task 5: Define server virtualization host availability strategy

Virtualization server hosts may become unavailable for planned reasons (such as maintenance) or unplanned reasons. Following are some strategies that can be used for both.

Planned

You can use live migration to move the virtual machines from one host to another host. This requires no downtime for virtual machines.

Unplanned

This scenario depends on the workload characterization types that the host is hosting.

  • For shared stateful workloads, use Failover Clustering within the virtual machines.

  • For stateful workloads, run as a high availability virtual machine on a Hyper-V cluster.

  • For stateless workloads, start new instances manually or through some automated means.

If you are using Failover Clustering in Windows Server with Hyper-V, consider whether to use the features listed in the following table. For additional information about each feature, click the hyperlink.

Functionality

Considerations

Hyper-V application monitoring

Monitor a virtual machine for failures in networking and storage that are not monitored by the Failover Clustering service.

Virtual machine priority settings

  • Set the virtual machine priority, based on the workload. You can assign the following priority settings to high availability virtual machines (also known as clustered virtual machines):

    • High

    • Medium (default)

    • Low

    • No Auto Start

  • Clustered roles with higher priority are started and are placed on nodes before those with lower priority.

  • If a No Auto Start priority is assigned, the role does not come online automatically after it fails, which keeps resources available so other roles can start.

Virtual machine anti-affinity

Set anti-affinity for virtual machines that you do not want to run on the same node in a Hyper-V cluster. This could be for virtual machines that provide redundant service or are part of guest virtual machine cluster.

Note: Anti-affinity settings are configured by using Windows PowerShell.

Automated node draining

  • The cluster automatically drains a node (moves the clustered roles that are running on the node to another node) before putting the node into maintenance mode or making other changes on the node.

  • Roles fail back to the original node after maintenance operations.

  • Administrators can drain a node with a single action in Failover Cluster Manager or by using the Windows PowerShell cmdlet, Suspend-ClusterNode. The target node for the moved clustered roles can be specified.

  • Cluster-Aware Updating uses node draining in the automated process to apply software updates to cluster nodes.

Cluster-Aware Updating

  • Cluster-Aware Updating enables you to update nodes in a cluster without impacting the virtual machines running in your cluster.

  • A sufficient number of cluster nodes must remain available during the update process to handle the load of the running virtual machines.

Preemption of virtual machines based on priority

Another reason to set virtual machine priority is that the cluster service can take offline a lower priority virtual machine when a high-priority virtual machine does not have the necessary memory and other resources to start.

  • Preemption starts with the lowest priority virtual machine and continues to higher priority virtual machines.

  • Virtual machines that are preempted are later restarted in priority order.

Note: Hyper-V clusters can have a maximum of 64 nodes and 8,000 virtual machines.

Step 5: Plan for virtualization fabric architecture concepts

This step requires defining logical concepts to which the fabric architecture will align.

Task 1: Define maintenance domains

Maintenance domains are logical collections of servers that are serviced together. Servicing may include hardware or software upgrades or configuration changes. Maintenance domains typically span host groups of each type or within each location, though they don’t have to. The purpose is to prevent server maintenance from adversely impacting any consumers’ workloads.

Note: This concept applies to physical network and storage components.

Task 2: Define physical fault domains

Groups of virtualization server hosts often fail together as the result of a failed shared infrastructure component, such as a network switch or uninterruptable power supply (UPS). Physical fault domains help support resiliency within the virtualization fabric. It is important to understand how a fault domain impacts each of the host groups you defined for your fabric.

Note: This concept applies to physical network and storage components.

Consider the example in the following image, which overlays maintenance and physical fault domains over a collection of host groups within a datacenter.

Fault domain

Figure SEQ Figure \* ARABIC 5: Example of a maintenance and physical fault domain definition

In this example, each rack of servers is defined as a separate, numbered physical fault domain. This is because each rack contains a network switch at the top and a UPS at the bottom. All servers within the rack rely on these two components, and if either fails, all servers in the rack effectively fail.

Because all servers within a rack are also members of unique host groups, this design would mean that there is no mitigation in the event of a failure of any of the physical fault domains. To mitigate the issues, you could add physical fault domains of each host group type. In smaller scale environments, you could potentially add redundant switch and power supplies in each rack, or use Failover Clustering for virtualization server hosts across physical fault domains.

In Figure 5, each of the colored, dashed-line boxes defines a maintenance domain (they are labeled MD 1 through 5). Note how each of the servers in the load-balanced cluster of virtual machines is hosted on a server virtualization host that is contained within a separate maintenance domain and a separate physical fault domain.

This enables the fabric administrator to take down all virtualization server hosts within a maintenance domain without significantly impacting applications that have multiple servers spread across maintenance domains. It also means that the application running on the load-balanced cluster is not completely unavailable if a physical fault domain fails.

Design decision - The decisions you make for Tasks 1 and 2 can be entered in the Settings worksheet.

Task 3: Define reserve capacity

The failure of individual servers in the fabric is inevitable. The fabric design needs to accommodate individual server failure, just as it accommodates failures of collections of servers in fault and maintenance domains. The following illustration is the same as Figure 5, but it uses red to identify three failed servers.

Failed servers

Figure SEQ Figure \* ARABIC 6: Failed servers

In Figure 6, server virtualization hosts have failed in the following host groups, maintenance domains, and physical fault domains.

Host group

Physical fault domain

Maintenance domain

2

2

3

3

3

2

4

4

2

The application running on the load-balanced cluster is still available, even though the host in Physical fault domain 2 has failed, but the application will operate at a third less capacity.

Consider what would happen if the server virtualization host that hosted one of the virtual machines in Physical fault domain 3 also failed, or if Maintenance domain 2 was taken down for maintenance. In these cases, the capacity for the application would decrease by 2/3.

You may decide that’s unacceptable for your virtualization fabric. To mitigate the impact of failed servers, you can ensure that each of your physical fault domains and maintenance domains have enough reserve capacity so that capacity will never drop below the acceptable level that you define.

For more information about calculating reserve capacity, see Reserve Capacity in Cloud Services Foundation Reference Architecture – Principles, Concepts, and Patterns.

Step 6: Plan for initial capability characteristics

After completing all of the tasks in this document, you will be able to determine the initial costs to host virtual machines and storage on the fabric, in addition to the initial service quality levels that the fabric can meet. You won’t be able to finalize either of these tasks, however, until you implement your fabric management tools and human resources, which are discussed in the Next Steps section of this document.

Task 1: Define initial SLA metrics for storage and virtual machines

As a fabric administrator, you’ll probably define a service level agreement (SLA) that details the service quality metrics that the fabric will meet. Your virtual machine administrators will need to know this to plan how they’ll use the fabric.

At a minimum, this will likely include an availability metric, but it may also include other metrics. To get an idea of a baseline for virtualization fabric SLA metrics, you can review those offered by public cloud providers such as Microsoft Azure. For virtual machine hosting, that SLA guarantees that when a customer deploy two or more instances of a virtual machine running the same workload, and deploys those instances in different fault and upgrade domains (referred to as “maintenance domains” in this document), at least one of those virtual machines will be available 99.95% of the time.

For a full description of the Azure SLA, please see Service Level Agreements. Optimally, your virtualization fabric will meet or exceed those of public cloud providers.

Task 2: Define initial costs to host storage and virtual machines

With your fabric designed, you’ll also be able to calculate:

  • The hardware, space, power, and cooling costs of the fabric

  • The hosting capacity of the fabric

This information, combined with your other costs, such as the cost of your fabric management tools and human resources, will enable you to determine your final costs to host virtual machines and storage.

To get an idea of the baseline costs for virtual machines and storage, you can review the hosting costs of public cloud providers such as Microsoft Azure. For more information, see Virtual Machine Pricing Details.

Although not always the case, you will typically find that your hosting costs are higher than those of public providers because your fabric will be much smaller than the fabrics of large public providers who are able to attain volume discounts on hardware, datacenter space, and power.

Next steps

After you complete all the tasks in this document, you’ll have a fabric design that meets your organization’s requirements. You’ll also have an initial service characteristic definition that includes the costs and service-level metrics. You won’t be able to determine your final service-level metrics and costs until you determine the human resources costs and the management tools and processes that you’ll use for your fabric.

Microsoft System Center 2012 provides a comprehensive set of functionality to enable you to provision, monitor, and maintain your virtualization fabric. You can learn more about how to use System Center for fabric management by reading the following resources:

System Center Technical Documentation Library

Fabric Management Architecture Guide