Improving IT Efficiency at Microsoft Using Virtual Server 2005
Technical White Paper
Published: August 1, 2005
|
Situation
|
Solution
|
Benefits
|
Products & Technologies
|
|
At Microsoft, pursuit of improvements in operational efficiencies led to the consolidation
of a number of physical datacenters. The creation of a utility model concentrated
many administrative and management tasks in the hands of teams of dedicated computing
professionals. The success of these initiatives fostered the search for additional
methods and tools to further improve efficiencies and lower costs.
|
Virtual Server 2005 provided Microsoft with the means to take consolidation to the
logical level. The Virtual Server Utility team assumed responsibility for deployment.
Internal customers were recruited for the pilot, with aggressive SLA metrics as
compelling incentives.
|
- Reduction in server provisioning intervals from 22-25 days to 1 day
- Cost reductions of ~30% over 3 years
- Improved customer satisfaction
|
- Virtual Server 2005
- Microsoft Operations Manager
- Systems Management Server
|
Executive Summary
Consolidation of physical infrastructure, in general, is an effective business strategy.
Consolidation of locally situated physical servers has proved effective in reducing
server sprawl and, thereby, improving IT efficiency, enhancing flexibility and reducing
Total Cost of Ownership (TCO). Virtualization takes consolidation to a new level,
breaking the 1:1 relationship between application and server. Virtualization is
a consolidation technique that yields additional benefits by abstracting the applications
from the physical server and placing them on Virtual Machines (VMs), many of which
can reside on a single Physical Host.
Virtual Server 2005 (VS), the Microsoft virtualization solution, is part of the
consolidation strategy, which includes a utility model for IT services. The Virtual
Server Utility (VSU), created by Microsoft IT, offers VS to internal Microsoft customers
as a centralized managed service, backed up by a Service Level Agreement (SLA) that
compares most favorably with the conventional scenario involving on-site servers
provisioned and managed by local Business Unit IT (BUIT) departments. The SLA comprises
a number of metrics that not only present a clear and compelling case to the clients,
but also a challenge to the VSU team. Those metrics include server provisioning
timelines, support availability, host availability, guest availability and host
CPU utilization. Cost savings, of course, are the bottom line.
The actual experience with Virtual Server 2005 at Microsoft was highly favorable.
Server provisioning intervals were reduced from 22-25 days for a self-hosted physical
server to one day for a virtual server. Cost savings to the clients were approximately
30 percent over three years and customer satisfaction improved. Across the entire
SLA, metric by metric, actual results met or exceeded expectations.
The purpose of this white paper is to share Microsoft experiences with Virtual Server
2005 in the pilot implementation. As Microsoft IT requirements are among the most
challenging in the world, the methods Microsoft IT employed and the lessons it learned
from this pilot implementation should provide highly meaningful guidance for customers
in subsequent general release implementations involving enterprise-scale IT environments.
Introduction
When considering Microsoft products and solutions, decision makers frequently ask
about experiences in using them within Microsoft. Microsoft IT not only provides
traditional IT functions for the company, but also acts as the company's first customer
for each new server and business productivity software release. As Microsoft IT
requirements are among the most challenging in the world, the methods Microsoft
IT employs and the lessons it learns from those first experiences often provide
highly meaningful deployment and operational guidance for customers in subsequent
general release implementations.
Microsoft IT has created a Compute Utility Team as part of its Utility Services
Team. The utility model positions these teams as utilities, or service providers,
chartered to leverage their expert resources on behalf of internal application and
service owners. Identified services include Compute, Storage and Data Protection.
The Compute Utility offers a service based on Virtual Server 2005, designed specifically
for hosting low- to medium-intensity applications and services that require some
measure of isolation. This service consolidates applications and services, placing
them on a shared resource in the form of a Virtual Machine (VM), several of which
reside on a single physical Virtual Server Host placed under the centralized management
and administration of a dedicated team of computing professionals. This approach
offers a highly reliable and extremely efficient means of addressing the computing
needs of Line of Business (LOB) applications and services. At the same time, it
relieves the application owners of many of the risks and complexities associated
with direct involvement in the day-to-day administration of physically distinct
servers. The owners can realize considerable and quantifiable capital savings in
equipment and can recapture precious data center space through reductions in the
physical footprint of the server solution, which yields reduced rack space requirements.
Operational cost savings include reduced overhead, power and environmental control
systems. Further, they can expect to enjoy increased operational agility as VS increases
the speed of provisioning and move, add and change activities. Security is always
a consideration. Depending on the specifics of the implementation scenario, virtualization
can lead to enhanced security, realized through the reduction of the overall attack
profile, standardization of hardware and operating systems, thorough implementation
of advanced security systems, and constant vigilance of the centralized utility
services team. Each VM and, depending on the implementation specifics, even each
application retains some measure of isolation, as each is associated with a separate
operating system instance. The application and service owners realize these benefits
immediately and can expect them to increase into the future as the business unit's
computing and networking demands escalate and the challenges of supporting them
intensify.
Despite the benefits of consolidation, stakeholders in some business units tend
to view consolidation with some degree of trepidation. Stakeholders fear that surrendering
day-to-day operational responsibility to a centralized services utility carries
with it a general loss of control of their applications. Specifically, stakeholders
express concern that a centralized utility services group would be less responsive
than their localized IT support group and, therefore, their core business activities
would be impacted through a loss of operational agility and overall performance
degradation in the systems and networks housing their mission-critical applications
and services. Through the creation of a transition team, the organization can address
these attitudes and perceptions and largely allay those fears through several means.
First, careful planning will avoid the inclusion of high intensity applications
and services, which are inappropriate for VS environment. Second, the team should
negotiate a highly specific SLA with the stakeholders.
Note: High-utilization applications designed to use high-end hardware may not provide
adequate performance if running on a VM, due to the inherent performance tax associated
with the virtualization layer. Microsoft SQL Server™ and Microsoft Exchange Server,
for example, can run on a VM. Depending on their workload in a given situation,
however, they may not be good candidates for virtualization.
Admittedly, the business unit stakeholders may experience some level of operational
performance degradation—even for appropriate applications and services—during short
periods of time when aggregate demands on network and computing systems peak. However,
the quantifiable benefits such as reduced cost and improved provisioning times,
coupled with more subjective benefits such as improved agility and tighter security,
can far outweigh modest performance issues. The VSU team largely was able to reverse
those attitudes and perceptions through a well-planned and carefully executed transition,
coupled with operational performance over time in accordance with the metrics established
in the SLA. Thereby, a consensus quickly built around VS as an optimum solution,
for the essence of optimization is striking an appropriate balance between cost
and performance. On the whole and taking all metrics into account, VS can, in fact,
yield considerable improvements in both.
This paper begins with an examination of the Solutions Framework, including consolidation
as a business strategy of, virtualization as a consolidation technique, Virtual
Server 2005 as a specific product solution, and the process of migration to that
solution. Then we define all of the terms relevant to virtualization and Virtual
Server 2005. Then we explore the deployment of Virtual Server 2005 as a utility
offering within Microsoft, from the consulting phase to the implementation phase
and, finally, the operations phase. We discuss the perceptions and attitudes of
the internal client community towards consolidation and virtualization, the SLA
that was crafted to assure clients of improved service levels, and the results of
the pilot implementation as they compare to the SLA metrics. A nested case study
is formed around the migration of an LOB application serving the Law and Corporate
Affairs business unit. This paper concludes with some insight as to future directions
for Virtual Server 2005.
Solution Framework
Consideration of Virtual Server 2005 is set in the framework of the overall goals
and objectives of the organization. The ultimate goal of an organization is to maximize
the return on investment (ROI) or some other bottom-line measure of its effectiveness.
A for-profit enterprise seeks to maximize the return to shareholders over both the
short term and the long term. To achieve that goal, the component units of the organization,
both individually and as a whole, must have as an objective the optimization of
their day-to-day operations, striking a balance between cost and performance. Consolidation
is one of the strategies that can be employed to achieve the defined goals. Virtualization
is a tactical option in the hierarchy of this solution framework, and Virtual Server
2005 is a specific product solution.
Consolidation: Strategy
Microsoft has focused on consolidating its IT infrastructure since 1999. In total,
Microsoft has identified six different approaches for reducing costs through consolidation:
Physical site, server, database, applications and services, operations management
and operating environment. In the context of this white paper, consolidation refers
to the grouping of multiple physical servers in a single location. This level of
consolidation can dramatically reduce the server sprawl that develops as individual
business units and workgroups tend to place applications and services of local interest
on dedicated local servers. Consolidation increases operational efficiency, enhances
flexibility and reduces the TCO.
Virtualization: Technique
Virtualization is a consolidation technique that offers additional benefits by abstracting
the applications and services from the physical computer through the process of
re-hosting applications and services in a Virtual Machine (VM), a number of which
can reside in a single Virtual Server (VS) host. Virtualization, thereby, not only
groups multiple applications and servers in a single centralized location, but also
breaks the 1:1 relationship between applications and servers. Each VM and, depending
on the implementation specifics, each application and service retains some measure
of isolation, however, as each is associated with an operating system that is seen
as an individual operating system instance. Virtualization offers the additional
advantages of application and service agility, as applications can readily be moved
from one physical computer to another, with little regard as to hardware specifics.
Application owners traditionally place LOB applications and services on dedicated
hosts, where they may underutilize associated server resources. Candidate applications
for virtualization are of low to medium intensity with respect to input/output (I/O),
processing or compute, memory and networking requirements. As the server resources
are not highly stressed, their functional lifespan is commonly extended beyond not
only their warranty, but also their technical lifecycle, which can render them technically
unsupportable. Continued use of physical systems at or near end of their usable
life can be costly from a maintenance standpoint, and can even put the applications
at risk. This is particularly true if the system is running an outmoded operating
system.
Migration: Process
Migration is the process of re-architecting or otherwise upgrading the software
and hardware for application servers. If necessary, Virtual Server 2005 will allow
multiple operating systems to reside on the same physical computer, allowing multiple
previously incompatible applications to run side by side, with each fully isolated
from the others. Migration to the latest Windows operating systems is preferable,
however, as it can provide significant improvements in security, availability and
manageability, especially when deploying them onto newer hardware platforms. In
either case, key to consolidation and virtualization is the development of a standard
migration process for assessing, planning, designing and implementing solutions.
Microsoft provides an additional tool to facilitate migration of physical servers
to a Virtual Machine running on Virtual Server 2005. The Virtual Server 2005 Migration
Toolkit (VSMT) works in-conjunction with Microsoft Automated Deployment Services
(ADS) to capture and redeploy an image of the source server's disks to a virtual
representation of the original hardware configuration. In addition to migrating
in a traditional Physical-to-Virtual scenario, VSMT also supports migration of VMWare
Virtual Machines to a format suitable for Microsoft Virtual Server.
Virtual Server 2005: Solution
Virtual Server 2005, also referenced as Virtual Server and VS, is the Microsoft
virtualization solution. An understanding of VS requires familiarity with terms
such as physical computer, host, Virtual Machine, Virtual Server and Virtual Guest:
Physical Computer
A physical computer is a physically distinct host computer, or machine, that provides
resources and capabilities including I/O, processing or compute, memory, storage
and networking.
Virtual Server Host
The Virtual Server (VS) Host is the physical computer that hosts, or runs, the Virtual
Server service. A single VS Host is a server that can simultaneously host multiple
Virtual Machines. If necessary, each VM can run a different operating system. For
example, a Virtual Server 2005 host can simultaneously support one VM running Windows
2003 Server™ one running Windows NT® 4.0 and one running Windows 2000 Server™,
with each VM fully isolated from the others.
Virtual Machine
Also referenced as a Virtual Guest, a Virtual Machine (VM) is a logical computer,
hosted within the confines of a physical server running the Virtual Server service.
Comprising an operating system, configuration information and one or more virtual
disk files, a VM emulates a complete physical computer, including I/O, processor,
operating system, memory, storage and network interface card (NIC) or network adapter.
A number of applications and services can reside on a single VM. A number of VMs
can reside on a single VS host, as illustrated in Figure 1.
.gif)
Figure 1: Physical Servers virtualized as Virtual Guests residing on a single physical
Virtual Host
Virtual Server Hosting Scenarios
VS deployments may take two forms: self-hosted and utility-hosted. The self-hosted
form describes a scenario in which the application owner also owns the physical
host server, the VS configuration and the associated VM allocations. The server
could be situated locally or in a data center. In either case, the owner retains
all burdens of ownership. The utility form describes a scenario in which a centralized
group is chartered to provide VS services to the application owners. The VS Utility
(VSU) owns the physical host computers and the VS software configuration, and allocates
the VMs residing on the machines on behalf of the application owners. While the
clients retain administrative access to the VMs, the burden of administering the
centralized physical computers and the VS software configuration shifts to the VSU.
Deploying Virtual Server 2005 At Microsoft
Microsoft first concentrated on physical consolidation, which set the foundation
for further improvements through the technique of virtualization using Virtual Server
2005. Microsoft began to focus on consolidating its IT infrastructure in 1999, with
the deployment of Windows 2000 Server™, the Microsoft Active Directory® directory
service and Exchange Server 2000. In total, Microsoft has identified six basic options
available to organizations wishing to consolidate a highly distributed computing
infrastructure:
-
Physical site: Reducing the number of physical locations where resources reside.
-
Server: Reducing the total number of individual servers for a particular application,
either in a single physical site or across multiple sites.
-
Database: Combining data from multiple databases into a single repository.
-
Applications and Services: Combining multiple applications and services on fewer,
shared servers.
-
Operations Management: Grouping skilled operations management staff in fewer physical
locations.
-
Operating Environment: Standardizing on fewer versions of the same operating system.
Reductions in TCO are the most compelling reasons to implement these consolidation
options, as they can yield significant, measurable increases in efficiency, productivity
and other cost benefits by reducing server hardware and software costs. They also
can yield reductions in the number of staff involved in systems administration,
monitoring and maintenance, perhaps allowing skilled staff to be reassigned to more
challenging roles of greater value to the organization. Further, the organization
can expect increased system flexibility, reliability, availability, security and
performance.
Experience in server and data center consolidation at Microsoft yielded savings
of $18.3 million U.S. (June 2004), which represents a 40 percent reduction from
pre-consolidation levels. Of that total, $8.9 million resulted from server consolidation,
attributable to the removal of LOB and other distributed servers, and the elimination
of remote and unmanaged servers in branch offices.
In many respects, the process of consolidation at the physical level (i.e., physical
sites, servers and operations management staff) is straightforward. Consolidation
is well understood at this level, with the general process of situation analysis
being well established and the solution options being readily apparent. Consolidation
at the logical level is somewhat, but not entirely, an extrapolation of that concept
and the associated processes and solutions. The initial implementation of Virtual
Server 2005 within Microsoft was intended to fill that void while yielding what
were anticipated to be considerable operational benefits to the corporation.
The deployment of Virtual Server 2005 at Microsoft is described in three phases.
The Consulting Phase addresses the development of consultative relationships between
the VSU team and the Business Unit IT (BUIT) departments as prospective clients.
The Provisioning Phase describes the process of screening a candidate application
for virtualization, testing it on a Qualification Host and finally migrating it
to a Production Host. The Operations Phase describes the responsibilities of the
VSU team and the application owners, and the specifics of the VSU service offering.
Consulting Phase: Perceptions and Attitudes
Microsoft makes substantial investments in the development of new technologies and
the applications and services they support. Microsoft not only tests these solutions
internally, but also acts as its first and one of its most demanding customers.
In introducing these new technologies, the Microsoft IT department faces the same
challenges as any client, although perhaps to a greater degree. Many Microsoft employees
have extensive experience in software development, system and network design, implementation,
operations and management. The customer base is so technically proficient as to
carefully scrutinize, and potentially resist, any attempt to consolidate systems
and application software, especially when associated with the imposition of a utility
model. Microsoft business units and individual users have many of the same concerns
as any other customer. These concerns include:
-
Loss of flexibility
-
Lack of responsiveness
-
Diminished security
-
Degraded performance
-
Loss of control
-
Loss of job security
Abstracting the hardware from its owners by physically removing the local servers
and consolidating them in the data center was challenging. Abstracting the servers
from the hardware through the creation of Virtual Machines was no less so.
Optional Participation
Participation in the pilot implementation of Virtual Server 2005 was optional. In
order to move the pilot forward, the Compute Utility team addressed the concerns
of each Microsoft BUIT team and the application owners it served. During the consulting
phase of the pilot, the prospective client presented its application/server requirements,
for analysis by the VSU team. The resulting presentation considered the BUIT's compute,
memory and networking requirements, as well as the preferred maintenance window.
Where appropriate, the VSU team developed custom solutions that included unique
exceptions processes, for example.
Service Level Agreements
Critical to the success of the pilot was translating the conceptual benefits of
VS into an SLA that presented a clear and compelling case to the client when contrasted
not only with conventional in-house BUIT performance, but also with a self-hosted
VS solution. The SLA also had to present a realistic challenge to the VSU team.
At a high level, the standard VSU SLA compares most favorably with self-hosting,
as reflected in Table 1.
|
SLA Provision |
Self-Hosted Physical Server |
Virtual Server Utility
|
|
Server Provisioning
|
~22-25 Days
|
1 Day
|
|
Planned Hardware Move/Add/Change
|
~7 Days
|
1 Day
|
|
Support Availability
|
24x7:
8x5 On-Site
After-Hours Remote
|
24x7:
8x5 On-Site
After-Hours Remote
|
|
Host Availability
|
N/A
|
99.99% Uptime
Active Monitoring
|
|
Guest Availability
|
Actively Monitor Heartbeat
|
Actively Monitor Heartbeat
|
|
Host CPU Utilization: Average
|
N/A
|
70%
Active Monitoring
|
|
Host CPU Utilization: Maximum
|
Active Monitoring
|
Active Monitoring
|
|
Respond to Client Request1
|
30 Minutes
|
30 Minutes
|
1 Resolution sensitive to nature and specifics of client request.
All of the SLA provisions compared favorably with the conventional in-house solutions
administered by the BUIT departments. Provisioning time offers a startling comparison,
targeted at one business day by the VSU team compared with the typical interval
of 22-25 business days required to provision a physically distinct server in the
conventional manner. Many provisions do not compare in the least, as they do not
even apply in a self-hosted physical server mode. According to Chad Lewis, Microsoft
Lead Program Manager for IT Utility Services, "We get so close to the technology
that we sometimes feel as though the solution speaks for itself. At the same time,
we realize that client perceptions can be quite different. To overcome the natural
resistance associated with moving to a new application platform, we created our
SLAs and billing model so that our clients could directly compare their experience
and costs between a physical server and a Virtual Guest. And, across the board,
our implementation matches or exceeds in both those areas."
""To overcome the natural resistance associated with moving to a new application
platform, we created our SLAs and billing model so that our clients could directly
compare their experience and costs between a physical server and a Virtual Guest.""
Chad Lewis
Cost Savings
In order to present a compelling case for the utility service offering, the VSU
team knew that, in addition to performance enhancements, it had to offer cost reductions,
which had to include both capital and operating costs. The utility model offered
the application owners considerable reductions in capital expenditures, as the cost
of a VM is much less than that of a physical server. Further, as the application
owner's compute requirements change over time, the capacity of a given VM or the
number of Vs. provisioned by the VSU can expand and contract accordingly. The application
owners essentially pay only for what they need and only when they need it.
Many overhead and operational costs associated with ownership of server hardware
also shifted to VSU, with examples being maintenance and repair, rack space rental,
power, insurance, and network connectivity. A combination of consolidation, virtualization
and the pure economies of scale were anticipated to yield cost reductions of approximately
20 percent. Detailed cost comparisons were presented to the Baits and application
owners during the consulting phase of the pilot.
Transparent Billing
VSU planned to recover those reduced costs through a combination of non-recurring
charges for the initial buy-in and monthly billing for its managed services. As
the monthly billing format was both detailed and easily readable, the VSU value
proposition was regularly validated and reinforced for the service subscribers.
Provisioning Phase: Making It Happen
Once a client requests service from the VSU, the build process begins. The build
process for a VM is essentially the same as that for a physical server in terms
of the fact that the same data center standards apply in every respect. Also, the
same management tools are installed as part of the initial provisioning. A key difference
is in provisioning time, with the target of one day for a VM, compared to the typical
experience of 22-25 days for a physical server. Provisioning a physical server involves
a lengthy hardware procurement process; the physical build process from the shipping
box to the server rack; and installation of the operating system, application software,
drivers and monitoring software. A VM requires no hardware procurement and no physical
build, but only the software installation. There are no custom OEM drivers to install,
so the software install can be more standardized, and thus more efficiently completed
as compared to a physical server. Assuming that a VM slot is available on a VS Host,
provisioning largely is a matter of configuring the VM on the Host and copying the
necessary files.
There are several distinct steps to the VM provisioning phase. First, the application
must be screened for candidacy. As previously noted, Microsoft SQL Server™, Microsoft
Exchange Server and other enterprise-class, high-utilization applications designed
to use multiprocessor hardware may not be good candidates for virtualization.
Once the application has been screened, the VM can be installed on a Qualification
Host specifically designated for performance testing of Vs. after they have already
been through code/functional testing in the IT Labs and prior to their installation
on a Production Host. The Qualification Host, which is functionally equivalent to
a Production Host, provides a means of testing the performance of the VM and the
applications it supports in a Virtual Server environment. It also provides a means
of determining the impact on the VS Host. So, both the owner and the utility get
a good feel for the final solution, and can make adjustments as necessary.
The results of the qualification testing guide the provisioning decisions as the
VM moves to a Production Host. As a VM is abstracted from the underlying hardware,
it is completely and easily portable from a Qualification Host to a Production Host,
all of which currently are 4Px2.2GHz machines i.e., machines with 4 processors,
each running at a clock speed of 2.2GHz. Porting a VM simply requires suspending
the VM, copying the configuration file to a production host and turning it up, which
process typically takes less than hour. There are two basic categories of VM Guests:
Standard and Custom.
|
Option/Specification |
Physical Host |
VM:Host |
Network Connectivity1 |
RAM2 |
HD3 |
|
Standard
|
4Px2.2GHz
|
≥ 8:1
|
Shared Copper Gbps
|
512MB
|
36GB, SAN
|
|
Custom
|
4Px2.2GHz
|
≤ 8:1
|
Shared Copper Gbps
|
≥ 1532MB
|
36GB, SAN
|
1 A dedicated NIC and additional RAM up to 3.6GB are available at an additional
one-time cost, each, per VM.
2 A dedicated NIC and additional RAM up to 3.6GB are available at an additional
one-time cost, each, per VM.
3 Additional SAN Hard Drive space carries an additional monthly cost, in MB increments.
-
Standard: A Standard VM makes relatively light demands on the host system.
No custom processor allocation is configured. The RAM allocation and connectivity
requirements are within the capacity of the default configuration. Therefore, eight
(8) or more standard VMs might share a Host comfortably. Appropriate for a Standard
VM are legacy applications known to be low to medium utilization, particularly if
they reside on EOW/EOL hardware, or new applications profiled and determined to
be low-to-medium intensity in workloads. Departmental web applications and LOB applications
are good examples. The vast majority of applications fit into this category.
-
Custom: A Custom VM requires a guaranteed level of performance, either formally
stated as an SLA requirement or as a simply the result of business expectations.
That level of performance demands a guaranteed capacity reservation, which may be
a full processor, or the equivalent thereof. Therefore, a four-processor Production
Host typically might be configured to support no more than four Custom VMs. The
VM:Host ratio can go higher than 4:1, and the VM:Processor ratio can increase, if
custom resource allocations are configured to ensure a high performance level at
all times. Examples of Custom VMs include domain controllers, as they are critical
to network operations and make intensive use of Active Directory. Certain applications
with existing and well-known performance requirements also require a Custom VM.
Note: Virtual Server 2005 is a 32-bit application, running on x86-compatible computers
running Windows Server 2003. A version for x64-compatible systems running Windows
Server 2003 SP1 x64 Edition is scheduled for late 2005, with the release of Virtual
Server 2005 Service Pack 1. This version is currently in use at Microsoft. Virtual
Server 2005 supports up to 32 processors and 64GB of RAM, including up to 3.6GB
of RAM per VM. Virtual Server 2005 uses the network and storage features in the
physical computer, including the attached Storage Area Network (SAN) drives.
Operations Phase: Making It Work
For a utility model to work effectively, it is important to clearly define the responsibilities
of ownership and to delineate between those of the Virtual Server Hosts, which are
owned by the VSU team, and the VM Guests, which are allocated by the VSU team, but
owned by the application or service owner.
The VSU operations team assumes responsibility for all aspects of monitoring, managing,
maintaining and protecting the VS Hosts, for allocation and configuration of the
VM Guests on those Hosts. As the VM's operating system is a separate operating system
instance on the network, application owners remain responsible for operating system
security configuration and certain other administrative functions in the same way
that they would be responsible for a physical server. Infrastructure issues such
as physical layer connectivity and data center operations remain the primary responsibility
of Data Center Services teams; Virtual Server Hosts have the same level of general
operations support as any other physical server in the data center. Any infrastructure
work performed on a VS Host is arranged and managed by the VSU operations team,
with the work being performed by Data Center Services, again just like any other
physical server. All client communications regarding the health and welfare of VS
Hosts and VM Guests are the responsibility of the VSU operations team. Should the
Guest fall below SLA levels for CPU availability, server utilization or any other
SLA component, the VSU operations team will identify that fact and work with the
client towards a favorable solution.
The VSU operations team monitors the VS Hosts to make certain that they meet Data
Center standards, but it is up to the VM Guest owner to ensure that the allocated
VM meets those standards.
VSU Service Elements: Operational Specifics
The VSU team offers a centralized service of VS Host support management and general
VM configuration. VSU service elements comprise Cost, Performance, Agility and Service
Management.
Cost
There are two categories for VM Guests: Standard and Custom. The Standard VM Guest
provides the best value for applications that do not require a fixed amount of CPU
resources, a lot of RAM, or dedicated network connectivity. A Custom VM Guest provides
specific and guaranteed CPU performance, more RAM, and the option for dedicated
connectivity. The basic specifications for both are contained in Table 2.
Standard and Custom Guests both involve a one-time charge, reflecting a portion
of the capital cost of the VS Host. The monthly recurring charges reflect a portion
of the monthly hosting charge for the Host, plus a managed services charge for the
VM Guest. In either case, a side-by-side cost comparison yields a savings of approximately
30 percent over three (3) years.
Note: The monthly charge for a Standard VM includes 1/8 of the monthly charge for
the Host. The monthly charge for a Custom VM includes 1/4 of the monthly charge
for the Host. In either case, the monthly charge includes 80 percent of the managed
services charge for the Guest operating system. A three-year depreciation schedule
applies to the capital cost of the Physical Hosts.
Performance
The performance of a VM Guest was benchmarked using a low to medium intensity web
application, with each VM Guest allocated one physical CPU and 512MB of RAM. Performance
was equal to or greater than the same application running on a 4Px700MHz Pentium®
III (2GB RAM) or a 2Px1.26GHz Pentium® 4 (1GB RAM). Performance was equally
good at a ratio of 2:1 (Guest: Processor). VS Guest performance began to degrade
only when the VS Host was under heavy stress. No custom resource allocation was
employed during the benchmarking.
Agility
The VSU specifically designed the VM Guests for maximum agility, as the ability
to quickly port them across VS Hosts is a critical advantage of the utility service.
This agility allows the VSU team to move the VM Guests from a Qualification Host
to a Production Host in a matter of an hour or less and, thereby to provision a
VS Guest within a day of receiving the order from the application owner. In the
event that a Guest begins to experience performance degradation on a given VS Host,
one solution is for the VSU team to coordinate with the owner and move the Guest
to another Host. For example and as illustrated in Figure 2, Virtual Host ABC has
begun to experience performance degradation as sustained CPU utilization has reached
90 percent, while Virtual Host XYZ is underutilized at 50 percent sustained CPU
utilization. Figure 3 illustrates the movement of the Web Application 1 on VM1 on
Virtual Host ABC to the unassigned VM2 slot on Virtual Host XYZ. The effect is to
relieve the performance problem on Virtual Host ABC and balance the load across
both Hosts at 70 percent CPU utilization. The total elapsed time associated with
this process typically would be in the range of one day, from the time the performance
problem is recognized until the movement of the Web Application is completed. Once
coordinated with the Guest owner, the VSU team can accomplish the actual move process
in less than one hour.
.gif)
Figure 2. Two VS Host Systems, Server 1 is running at 90% utilization, Server 2 is
running at 50% utilization
.gif)
Figure 3. VM Web App 1 is moved from Server DCCUVS01 to Server DCCUVS02. CPU Utilization
on Server DCCUVS01 is reduced to 70%, and Server DCCUVS02 is increased to 70%.
Service Management: Availability
The VSU team manages the VS Hosts through constant operational checks on the VM
guests to ensure that each is operating within Data Center standards. Any VM Guest
that is out of compliance, is putting the VS Host or any other VM Guest at risk,
or is found doing work for which other than that for which it was designated will
generate an immediate trouble ticket escalation advising the Guest owner of the
problem and establishing a time frame for resolution. If a resolution is not implemented
within that time limit and VSU operations determines that the Guest is putting the
VS Host or other VM Guests at risk, VSU operations shuts down the VM Guest. In combination,
these measures ensure 99.99 percent Host availability. In turn, the Host availability
enables VM Guest availability of up to 99.99 percent, assuming that it is properly
managed by the owner.
In order to maintain SLA objectives for both Standard and Custom VMs, the VSU team
leverages four mechanisms to manage CPU usage:
-
Placement: The initial screening process determines if the application is best suited
for a Standard or Custom VS Host. Performance testing on the Qualification Host
serves to validate that placement prior to moving the VM and application to a Production
Host.
-
Relative Weight: A relative weight is manually assigned to each Guest. A Guest with
a higher relative weight can demand CPU cycles from another Guest. A Guest with
a lower weight must release CPU cycles to a Guest with a higher weight, if so requested.
-
Maximum Capacity: Each VS Host has a finite CPU capacity, which is shared among
the VM Guests. Therefore, each Guest is manually assigned a maximum available CPU
capacity, which is sensitive to the demands of other Guests.
-
Reserve Capacity: Each VM Guest is manually assigned a given amount of CPU capacity
that is always available, regardless of the demands of the other Guests.
Communications
Communications between the VSU team and the application owners are meant to be early
and often. The SLA establishes a target of 30 minutes for the VSU team to acknowledge
a client request. Resolution of the request depends on its nature. For example,
the target for a break/fix is 30 minutes, which is the same for a Virtual Server
as for a physical server. The VSU strives to model the pre-existing break-fix and
change communication process in this case. In general, any communication about a
VM Guest mirrors exactly the experiences customers have with their physical servers.
The VSU team notifies owners by email of all changes that may impact a VM Guest.
Well-established escalation policies ensure that issues of significance gain the
proper level of attention and are afforded a rapid response. Examples, in general
order of severity, include
-
VS Host adversely affecting performance of a VM Guest
-
VM corruption
-
Any issue caused by VS software
-
Multiple VMs on Host down
-
VS Host down
Change Management
Change management is critical. All change requests are entered into and are tracked
by various change tools. The VSU team advises the VM Guest owners in advance of
any planned configuration changes to the VS Host and, where appropriate, provides
them an opportunity to review and comment on those changes. The VSU team measures
the success or failure of a change by monitoring CPU Utilization, VS Host Availability
and Client Satisfaction. A change to a VM Guest is made only at the request of the
owner, unless it is required to protect the VS Host. In either case, the owner is
notified prior to the change.
Monitoring
Systems Management Server (SMS) and Microsoft Operations Manager (MOM) constantly
monitor VS Hosts and Virtual Servers, as is the Data Center standard. Additionally,
the VSU team monitors the VS Hosts for specific indicators that could signal that
the SLA might be in jeopardy and alerts the owners immediately. Such indicators
include CPU Utilization, Network I/O, Storage Utility (SU) storage, Host Availability
and VM Guest availability.
VS Host systems utilize standard OEM hardware-specific agents, as well as the standard
compliment of Microsoft Systems Management Server (SMS) and Microsoft Operations
Manager (MOM) Host Agents. In addition to being instrumented as a standalone node,
the MOM 2005 Virtual Server Management Pack is deployed to all hosts to allow enhanced
capabilities to manage and monitor aspects of Virtual Server and Virtual Machines
that are exposed through the Virtual Server APIs, performance counters, and event
log. Capabilities in the MOM VS MP include development of host-to-guest mappings,
control over VM states such as shutdown, start, pause, and save. These capabilities
also include performance monitoring of key counters, collection of key Virtual Server,
and Virtual Machine events.
Virtual Machines are unique nodes, from within the guest operating systems. Each
carries their own SMS, MOM, and other non-hardware specific monitoring and management
agents or tools.
Security
Microsoft considers security to be of paramount importance for both itself and all
of its clients, both internal and external. In consideration of the dynamic nature
of security threats , Microsoft is constantly working to ensure that its products
and networks are highly secure. Virtual Server offers potential security advantages
in comparison to consolidation of individual applications (with multiple different
owners) onto a single operating system instance. For example, if eight separate
applications were to be consolidated onto a single operating system instance, all
eight of those application owners will have access to all the applications if granular
rights cannot be delegated or if administrative access to the operating system is
required, even if they don't have any responsibility for the host system or other
collocated applications. Additionally, the attack surface for each of those applications
increases, as that operating system instance now has many more end-users leveraging
that same system for multiple different uses. In contrast, eight applications consolidated
onto one physical host using Virtual Server means that each application has its
own operating system instance, its own unique administrator, its own IP address
and specific IPSec and Group Policy rules; each guest is a standalone security entity
with no relationship to the other seven guests on that same physical host. VM Guest
owners have access to several tools to administer VM.
-
Virtual Server Web Console enables secure, authenticated administration and client
remote access.
-
Automated Deployment Services and Virtual Server Migration Toolkit provide command
line tools for converting from physical to virtual or virtual to virtual, easing
migration to a virtual machine environment.
Providing physical consolidation while maintaining application independence is a
key security benefit of Virtual Server, and overall reduces the attack profile considerably.
Improved patch management has yielded improved security as the VSU team carefully
controls VS Host patching. While owners are responsible for Guest patching, the
VSU team works closely with them to ensure that Guest and Host patch processes are
tightly coordinated. The VS Utility team works to ensure that the VS Hosts and Guests
are well secured.
Microsoft IT has found an additional security benefit associated with the use of
Virtual Server in consolidation of legacy applications running on old hardware.
Such hardware can be costly or even impossible to maintain as spare parts may not
be readily available. That fact requires that SLA provisions, proactive maintenance
and management agreements be reduced, which can increase the likelihood that such
a system is not being maintained to security standards. Moving a system that is
still required by the business from an old and unsupportable hardware platform into
a Virtual Machine allows Microsoft to bring the application back into a fully supportable
environment and provide a higher level of service, which may increase the level
of security.
VS Hosts
Clearly, the Host must be secure for the Guests to be so. Therefore, access to Virtual
Server and VM administrative functions must be performed using an authenticated
and secured connection. VM owners do not have administrative access to the VS Host
operating system or to the VS applications and interfaces.
VS Guests
VMs have a unique security identities and are "first class citizens" on the network
with respect to IPSec policy, Windows firewall rules, networked services and so
on. Any VM exposed to the network must adhere to security standards appropriate
to that environment. The administrator for each VM has access to the configuration
for that VM, but not to the VS operating system or to other VMs. Each VM has its
own security identity and the ability to apply unique Group Policies or other specific
configuration that is required.
Data Protection and Storage Utility
It also is important that data is backed up and, therefore, protected from loss,
as well as theft. The VSU team ensures that the Host follows standard Data Center
backup policies, although it is up to the owner of the VM Guest to establish its
own backup schedule for application data. File-level drive backups capture copies
of all VM and network files. All Virtual Hard Disk file storage is on the SAN.
Just as Virtual Server abstracts the server from the hardware, so does the Storage
Utility (SU) abstract the storage from the hardware. All file storage, including
VM configuration, is on the SAN, which has virtually infinite capacity. VS Hosts
are connected to the SU fabric via redundant paths comprising dual fiber optic cables
and switches. Data backups are striped across multiple disks in the highly redundant
SU fabric. VM Guests are provisioned 36GB storage as a default, with additional
storage available as required. In the event of the total failure of a VS Host and
the resident VMs, the SAN allows them to be fully restored within a matter of minutes
of the provisioning of a new physical server to support them.
.gif)
Figure 4. Virtual Server Host system with running Virtual Machines, showing the VM
Configuration and Virtual Hard Disk files on the Storage Area Network
Law And Corporate Affairs: A Case In Point
In the early stages of the Virtual Server 2005 pilot, the Compute Utility team held
discussions with Law and Corporate Affairs IT (LCAIT) towards leveraging VMs for
an existing mission-critical application. Those discussions led to the identification
of a specific internal tool as a candidate application. The tool performs as a middle-tier
in a system that handles tasks relating to providing and managing access to legal
documentation. The application performs such functions as loading, grouping, annotating,
searching, reviewing and printing of documents. The design of the tool requires
multiple instances, each within its own operating system and retaining unique identity.
The application requires a high level of availability, but was not constrained by
existing resources, even on relatively old hardware and lightweight configurations.
The tool seemed ideally suited for migration to a VS-hosted environment.
The application tool resided on 15 systems nearing the end of their usable life
in Datacenter #1, with those systems scheduled to be moved to Datacenter #2 within
a few months at a cost of approximately $900 in IT labor per server move. Once LCAIP
and the Compute Utility team reached agreement to evaluate the application for migration
to a virtualized environment, three VMs were created and configured on the VSU qualification
server. The LCAIT team conducted approximately three weeks of performance testing,
with positive results. Based on that experience, the teams decided to proceed with
redeploying the application on production Virtual Machines spread across multiple
VS hosts managed by the VSU team.
Since deployment, the solution has met and exceeded the expectations of the LCAIT
team. The expectation is that LCA will save $33,400 in capital costs by purchasing
shared VS Hosts instead of purchasing 15 stand-alone utility servers, and without
incurring any additional management complexity. They will also save approximately
$8,800 per year in hosting charges by leveraging VMs instead of physical servers.
Further, Microsoft IT will save upwards of $13,500 by avoiding the relocation of
15 outdated servers, plus the potentially excessive ongoing maintenance costs that
would have been required for those systems. In addition to the cost savings associated
with virtualization, LCAIT will realize improved processing power and enhanced scalability,
while shedding further concerns about hardware life cycles. Security on the shared
SAN SU is considered equal to that of the standalone SAN. Across the full range
of metrics, the Utility SLA promises performance equal to or better than the BUIT
can deliver on a the basis of standalone self-hosted servers. Microsoft IT treats
SLAs as legally binding contracts, so customer expectations are high.
Results
The actual results of the Microsoft internal implementation of Virtual Server can
best be examined by comparing them side-by-side with both the experience with self-hosted
physical servers, as a baseline, and the initial targets set for the VSU. Table
3 provides such a view across key SLA provisions.
|
SLA Provision |
Experience: Self-Hosted Physical Server |
Target:Virtual Server Utility (Standard) |
Experience: Virtual Server Utility (Standard) |
|
Server Provisioning
|
~22-25 Days
|
1 Day
|
≤1 Day
|
|
Planned Hardware
Move/Add/Change
|
~7 Days
|
1 Day
|
≤1 Day
|
|
Support Availability
|
24x7:
8x5 On-Site
After-Hours Remote
|
24x7:
8x5 On-Site
After-Hours Remote
|
24x7:
8x5 On-Site
After-Hours Remote
|
|
Host Availability
|
N/A
|
99.99% Uptime
Active Monitoring
|
99.99% Uptime
Active Monitoring
|
|
Guest Availability
|
Actively Monitor Heartbeat
|
Actively Monitor Heartbeat
|
Actively Monitor Heartbeat
|
|
Host CPU Utilization: Average
|
N/A
|
70% Active Monitoring
|
20% Active Monitoring
|
|
Host CPU Utilization: Maximum
|
Active Monitoring
|
Active Monitoring
|
Active Monitoring
|
|
Respond to Client Request1
|
30 Minutes
|
30 Minutes
|
30 Minutes
|
|
Cost
|
N/A
|
20% Savings
|
~30% Savings
|
1 Resolution sensitive to nature and specifics of client request.
Table 3. SLA Comparison: Self-hosted Physical Server And Virtual Server Utility
Taken as a whole, Microsoft experiences with the VSU have been very much in line
with expectations, meeting or exceeding them across every metric in the SLA, which
was intentionally quite challenging. VS has performed so well that Average Host
CPU Utilization, initially targeted at 70 percent, currently is only 20 percent
for the average VS Host. As a result, the VSU team has adjusted its expectations,
and intends to alter the qualification levels for candidate applications accordingly.
Critical bottom-line measurements include cost and customer satisfaction. Cost savings
were better than expected. The VSU team realized capital costs reductions exceeding
45 percent, which contributed to costs savings for the business units of approximately
30 percent, compared to the 20 percent initially anticipated. Measurements of customer
satisfaction were one the basis of one-on-one, case-by-case feedback, which was
extremely positive. Some customers actually referred to the service as being so
good as to be transparent. Although this is anecdotal evidence, the VSU team took
the characterization of transparent virtualization as quite a compliment. The pilot
implementation was small enough that anecdotal measurement of customer satisfaction
was considered acceptable, although an automated ticketing system was placed into
service shortly after its conclusion. Associated with that system is automatic surveying
of customer satisfaction.
Future Directions
Virtual Server 2005 was designed as a highly scalable solution and VSU futures at
Microsoft include further cost reductions through the use of more capable VS Hosts.
The current specifications for Virtual Server 2005 include multicore computers running
up to 32 processors and providing up to 64GB of RAM, including up to 3.6GB of RAM
per VM. Support for 64-bit computers and the Windows Server™ 2003 x64 Edition operating
system is planned for late 2005 with the release of Virtual Server 2005 SP1. The
pilot intentionally limited the Hosts to a consolidation ratio of 8:1 on 4Px2.2GHz
machines, with an expectation that average CPU utilization would approach 70 percent.
Analysis of the results made it clear that the host utilization target was overly
pessimistic and that there is a good deal of room to increase the VM:Host consolidation
ratio without putting performance at risk. As commodity hardware continues to increase
in capability, further improvements in VM:Host compression will be achievable. That
will yield improved efficiencies, which will translate into further cost reductions.
Further development of automated provisioning, ticketing and change management systems
will include user interfaces that will be intended to make the VSU interface as
intuitive as scheduling a meeting in Outlook. The application owner should have
visibility into the pool of VS Hosts to see view and collect performance statistics
on a given VM, to make configuration changes based-upon a setup of rules defined
by the host owner, to suspend it and, ultimately, even to provision it. VSU plans
currently are to virtualize 10 percent of the data center in the near term. In the
longer term, virtualization may well enable considerable shifts in IT strategy.
For IT-centric organizations, that translates into shifts in core business strategy.
In Summary
Microsoft began consolidating its physical IT infrastructure some years ago, with
the deployment of Windows Server™ 2000, the Microsoft Active Directory® directory
service and Exchange Server 2000 serving as enabling solutions. Virtualization appeared
as the next logical step in the progression, with the development of Virtual Server
2005. As is the practice for each new server and business productivity software
release, Microsoft IT acted as the company's first customer for Virtual Server 2005.
As Microsoft IT requirements are among the most challenging in the world, this pilot
implementation was intended to be a thoroughly rigorous test of Virtual Server's
capabilities. Also, the methods Microsoft IT employed and the lessons it learned
from these first experiences were expected to yield meaningful deployment and operational
guidance for customers in subsequent general release implementations.
The Microsoft IT department is organized along the lines of a utility model, comprising
Compute, Storage, and Data Protection Utilities. Within the Compute Utility, the
VSU offers Virtual Server 2005 to internal Microsoft customers as a centralized
managed service.
Securing the participation of internal application owners, and the BUITs serving
them, required the development of SLA metrics that built a clear and compelling
case for transitioning to a utility model. These metrics include server provisioning
interval, support availability, host availability, guest availability and host CPU
utilization. Cost savings and customer satisfaction, of course, are the bottom line,
and results met or exceeded expectations in every category. Future directions at
Microsoft call for the introduction of more capable physical hosts, which will yield
greater efficiencies and further lower costs. Improvements in provisioning, ticketing
and change management systems will include an intuitive user interface that will
put a measure of control back in the hands of the owners.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
the local Microsoft subsidiary. To access information via the World Wide Web, go
to:
http://www.microsoft.com/
http://www.microsoft.com/technet/itshowcase
For any questions, comments, or suggestions on this document, or to obtain additional
information about How Microsoft Does IT, please send e-mail to: showcase@microsoft.com
Microsoft
Virtual Server 2005 Home Page
Solution
Accelerator for Consolidating and Migrating LOB Applications
Deploying a Worldwide Site Consolidation Solution for Exchange Server 2003 at Microsoft
Server and Data Center Consolidation: Microsoft IT Enhances Cost Savings, Availability,
and Performance
Windows Server
System Reference Architecture
Server and Data Center Consolidation: Microsoft IT Enhances Cost Savings, Availability,
and Performance
The information contained in this document represents the current view of Microsoft
Corporation on the issues discussed as of the date of publication. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy
of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user.
Microsoft grants you the right to reproduce this White Paper, in whole or in part,
specifically and solely for the purpose of personal education.
Microsoft may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Microsoft, the furnishing
of this document does not give you any license to these patents, trademarks, copyrights,
or other intellectual property.