Desktop Virtualization: Care and Feeding of Virtual Environments
Virtualization has evolved from an anomaly that required explanation to a viable technology most of us can’t live without. Perhaps you’re using it for quality assurance testing, development, Web design or training. Maybe you’re part of the vanguard—setting the trend by deploying a virtual infrastructure, or even one of the masses using “cloud” virtualization from Amazon.com, Rackspace Inc. or another cloud vendor.
No matter how you’re using virtualization, if you’ve used it for any length of time, you’re no doubt realizing that it comes with its own set of challenges—just as maintaining physical hardware has its own dilemmas. Many issues are different; others are similar.
You’ve probably heard the word “hypervisor” bandied around for a while. It has become the cool term in virtualization. Hypervisors aren’t new, however. We’ve been using them as long as we’ve been using virtual machines (VMs). In fact, IBM coined the term hypervisor in the 1970s.
The hypervisor is the software that presents the guests running “virtually” on a system with a set of virtualized hardware. It abstracts the physical hardware for the guest OSes. The confusion comes about with the big push to “type 1 hypervisors” running on the x86 platform over the last several years, including Microsoft Hyper-V and VMware ESX Server. The hypervisor most people use—especially for client systems—is referred to as a “type 2 hypervisor.” What’s the difference?
- A type 1 hypervisor runs directly on the host hardware and does not require a “host OS.” Microsoft Hyper-V and VMware ESX Server are common examples of a type 1 hypervisor.
- A type 2 hypervisor requires a host OS to function. Generally, a type 2 hypervisor runs principally as a user-mode application on its host OS. Microsoft Virtual PC and VMware Workstation are common examples of a type 2 hypervisor.
More often than not, you would want to use a type 1 hypervisor for any “always-on” workload, such as a virtualized SQL or file server. At a minimum, it will use fewer resources than a type 2. Depending on the host, however, it may require a user logon in order to start, which isn’t a good option for a mission-critical system. A type 2 hypervisor, on the other hand, makes more sense for “on-demand” VMs. This type of role includes VMs for testing, application compatibility or secure access.
What Does Virtualization Save?
The obvious answer is that virtualization saves money on hardware, but it’s not quite that simple. Sure, if you have two server systems in rack-mountable 1U form factors, and you take those two same workloads and load them on one 1U system, you’ve saved on up-front hardware costs—but there’s a trick to it. When you take those same two server systems, both operate happily on two individual 1U servers, where each one has dual-core CPUs, 2GB of RAM and a 160GB SATA hard disk.
Now, when you put both of those onto one server, with the same hardware configuration, you’ll have to split the resources down the middle—or will you? You’ll generally need more resources for a type 2 hypervisor.
Then take the CPU, RAM and HDD costs necessary into account when figuring out how to consolidate workloads from physical to virtual. Virtualized consolidation is often called “stacking systems vertically instead of horizontally,” because you’re removing dependence upon n physical systems from an OEM. In turn, you’re asking far more of one individual system than you might have been prior to virtualization. This creates a systems-management ricochet many organizations don’t take into account as they rush headlong into virtualization.
What Does Virtualization Cost?
Once upon a time, good virtualization software cost quite a bit of money. Over time, the market has heated up, and you can get many types of virtualization software for fairly low cost. Most of the critical enterprise features still cost money, however, including the host OS or hypervisor.
Depending on the workload you’re planning to run on a VM, you may need to investigate failover. Guests get corrupted sometimes, and host hardware can fail. Virtualization doesn’t make hardware more reliable. It just changes the odds. For mission-critical systems, you still need to come up with a strategy for backing up the guest OS whether you’re backing up the VM container itself (which is absolutely recommended) or the file system contained within.
Even if you’re just virtualizing a bunch of guest OSes for testing or development on a type 2 hypervisor, you still need to allocate enough RAM to run one or more of those guests at a time (on top of the host OS). The most often overlooked issue in virtualization management is disk space consumption.
I’ve used virtualization for some time as a security test bed. Nothing beats running a potential exploit on a VM, seeing it work, and rolling back to an earlier version using your hypervisor’s undo or snapshot functionality, only to retest it again. The real beauty of stacking these undo changes one on top of the other is that disk space can rapidly get out of control. It may end up far exceeding the actual size of the hard disk within the guest OS itself.
One of the VMs I use regularly has a 50GB hard disk image—I didn’t realize how out of control it had gotten until I attempted to move it (it had six VMware snapshots), and the disk was well over 125GB.
Here are a few best practices to minimize the impact/cost of virtualization:
- If you’re using a Windows client OS on a type 2 hypervisor with “undo” functionality, then by all means, disable Windows System Restore. Otherwise, you’ll have disk growth every time you make a system change.
- If you perform step 1, be religious about demarcating when you do want to create an undo point.
- If you’re doing security/exploit testing—do not rely on Windows to roll you back to an earlier point in time. Use your hypervisor’s undo functionality, as it can’t generally be tainted in the way restore points can be.
- Run guest OSes with the minimal amount of resources necessary.
- Ensure you’ve allocated enough RAM so client OSes aren’t swapping RAM to disk all the time. This can slow down your host and all of your guests.
- Defragment your guests internally, and then defragment them externally (see the section on defragmentation later). Do both with some regularity.
As you can see, managing VMs can quickly become a problem. The ease of duplicating a VM can be a great benefit, but it can also create huge problems with managing and securing guests, keeping track of OS licenses with Windows (pre-Windows Vista, where new key management can actually be of benefit here), and making sure trade secrets don’t escape from your grasp. It’s a heck of a lot easier for a rogue employee to take a VM out via a USB flash drive or USB hard drive than it is for them to try and take an entire desktop system.
VM proliferation is much more of a problem among the highly technical (who understand the innards of virtualization). Generally, it’s also more prevalent among client guests, not among virtualized server guests.
Entire companies have started to focus on helping to regain control over virtualized systems. Both Microsoft and VMware have consciously focused less on the value of virtualization itself and more on systems management. This is important because you aren’t getting rid of systems—you’re just virtualizing them.
Many systems management products can perform perfectly fine on VMs—but some newer functionality allows more intelligent management of virtualized systems, including waking and updating of guests that would otherwise fail to be updated. In the era of zero-day exploits, that’s critical. The last thing you need is an infrequently used VM becoming the local botnet representative on your corporate network.
Your systems management approach needs to take into account that you have hosts and guests, ensuring they’re updated accordingly, and that it knows the roles of each. The last thing you want is a poorly designed patch management solution updating your hypervisor, and tearing it down in the middle of the day for a reboot, taking four mission-critical guest servers with it.
You also need to be approaching recovery of these systems in the same way you would have historically. Just because a system is virtualized doesn’t mean you can’t lose it due to registry corruption or corruption of the entire VM—you can. Back up with the same fervor you apply to your physical systems today.
One extra consideration is whether your hypervisor does undo functionality. Bear this in mind when patch management comes into account. It’s easy to update a guest on the Wednesday after Patch Tuesday, have it rolled back to Monday’s undo point, only to get hit by a zero-day that it was theoretically “protected” against. This is a big problem, given undo technologies work by rolling back to an earlier point of the entire disk presentation from the hypervisor—meaning you will lose any Windows and application patches, as well as any antivirus signatures.
Undo functionality aside, you need to be providing the same security protection to virtualized guests as you would to physical machines, and then some. When it comes to inbound threats, VMs are just as susceptible as physical machines—it makes no difference.
But the big difference is that non-critical VMs (those that are not always on) often have latency for patching and AV updates. As a result, these can become a much bigger, untrackable target for zero-day exploits. This is all the more reason to ensure you’re using a mature systems management solution that can take this into account and patch virtual systems as well.
Outbound threats are a different matter. VMs can be a doorway to the theft of intellectual property. This is critical to understand because VMs running on uncontrolled hosts can create a loophole for your data. First, if the virtual environment can be copied easily, that’s a problem—especially if you’re dealing with any compliance requirements that control access to data (as I discussed in an article back in 2008, (http://technet.microsoft.com/magazine/2008.06.desktopfiles).
Second, as you might recall from my article on RMS and IRM (http://technet.microsoft.com/magazine/2008.11.desktopfiles), these controls rely upon the OS to prevent screen capture, printing and so on. However, those controls don’t stretch out to the hypervisor—meaning that if RMS-protected content is displayed on a guest OS, the host OS can still print individual screenshots or create a video capture of the screen.
Though it isn’t technically analog, this isn’t entirely different from the “analog hole.” I’m not aware of any way to protect DRM-controlled content from being exploited in this manner. Realistically, even if you could, then you’re back at the problem of protecting from users with cameras or video cameras who can perform the same “exploit.”
Disk defragmentation is a unique challenge on VMs, for several reasons:
- You generally will have two levels of fragmentation—within the virtualized disk container itself (fragmentation the guests each see on their behalf)—what I refer to as “primary fragmentation,” and fragmentation of the actual file containing the virtualized disk across the disks of the host OS, or “secondary fragmentation.”
- Virtualization products with disks that are the minimum size required and grow “on-demand” can lead to secondary fragmentation.
- Undo functionality can rapidly lead not only to disk bloat, but massive secondary fragmentation—because as it consumes additional host OS disk space, each guest begins competing for available sectors.
- With disks that grow on-demand, most do not have the capability to shrink when demand wanes. If you allocate 40GB, only use 10GB initially, but grow to require 35GB, the disk will not recover on its own—meaning you’ve got a large file that’s much more likely to have secondary fragmentation.
The sheer size of virtual disks, the velocity with which they can change, shrink or grow, and the fact that they’re susceptible to two types of fragmentation means you should treat them even more seriously than you would physical systems.
Here’s one approach to protecting your files:
- Minimize the use of any undo technology, as it will cause undue growth of the overall disk files, and can’t readily be defragmented in the guest, though the host can defragment the files that comprise the virtual disk.
- Use a good disk defragmentation product on your guests to begin with, and run it regularly.
- If you’re using on-demand disk expansion technology:
a. Use the Sysinternals sdelete.exe utility as follows: sdelete –c drive_letter where drive_letter is the volume you want to zero-out. For example sdelete –c C: will zero-out all unused disk space after defragmentation.
b. Use any virtual disk shrinking technology (if provided by your vendor) to reduce the virtual disk container to its minimum size
- Defragment the host OS’s volumes containing the VMs.
Many people disregard disk defragmentation. The sheer volume of reader mail I’ve received from my article on disk defragmentation in 2007 (technet.microsoft.com/magazinebeta/2007.11.desktopfiles) proved it’s often a misunderstood topic, but shouldn’t be ignored—even with virtualized systems.
As virtualization continues to explode in importance and use, it can become too easy to get swept into the “consolidate” message without understanding the costs, and its inherent unintended complexities. This should help you discover some of the additional costs you need to consider when migrating to, or living with, virtualization.
Wes Miller is the Director of Product Management at CoreTrace in Austin, Texas. Previously, he worked at Winternals Software and as a program manager at Microsoft. Miller can be reached at firstname.lastname@example.org.