I hate tapes. They’re slow. They don’t hold much data. They tend not to work when you most need them. Tape is yesteryear’s technology; one that I personally wish would die the death it’s long deserved.
Now, disk? There’s a storage medium I can wrap my arms around. Disk as a backup medium is hip. It’s fast, even across consumer-grade connections like eSATA and FireWire. It holds a ridiculous quantity of data; I have a disk drive sitting next to me that holds more than the first refrigerator-sized SAN I deployed years ago. Disk pretty much always works when I need it to, and I can plug it into pretty much anywhere when I need to grab my data.
So if disk is so wonderful and tape so wonderless, why are we still buying backup tapes? Why do we still trust our backed up data to these ancient solutions that just don’t sparkle like they used to?
One word: History. Allow me to explain …
Tapes and tape drives have been around almost since the time of the first computers, with the first computer tape drive being invented in 1951. Today’s tapes, like DDS3 and SDLT among others, showed up slightly more than 10 years ago. With good backups being a critical part of any IT infrastructure, this means that any organization that has an IT infrastructure probably also makes use of tape-based backups. They’re a sunk cost.
Because tape-based backup is already in place, and because it probably cost a fairly good amount of money to purchase, tape represents a “stuck” technology for a lot of small and midsize environments. Tape-based backups are a solution that “just works” most of the time, so there’s little incentive to change our ways.
At least, until your first tape failure occurs.
It’s often when that first critical tape fails that many organizations take a step back and rethink how they protect their data. Whether you have to wait for that horrible event to happen before you rethink your backups, consider an entirely different approach.
Ponder with me for just a moment. Let’s think about that different approach, one where the storage medium for our backups is disks rather than tapes. How would that change the way we handle backups and restores? Let’s run through a little list of how things work in this new reality:
Disks are always connected. Tapes must be inserted into a device if they’re to be written to or read from. With any individual tape generally being smaller than the amount of data that needs backing up, this means tapes must be cycled in and out of that device over and over again. If you’ve got the money, you can purchase a robot to do this for you. If you don’t, you probably find yourself stopping in the office every weekend to shuffle tapes in order to complete your backups. Disks, on the other hand, are always connected to your network. This means that their entire storage capacity is always available and always on.
Reading and writing from disks needn’t be linear. With tapes, any job requires searching for the right location in a catalog, fast-forwarding, and eventually reading or writing the necessary files, a lengthy process that gets worse when files are fragmented across multiple tapes. This process always reminds me of loading video games in the old days with my Commodore 64’s cassette drive, a ridiculously slow process that ensured I spent more time playing outside than sitting in front of the computer. With disks, updates can be read from or written to any available location on any disk platter. This means that expired data can be easily overwritten, needed data quickly found, and backups are never spread across multiple devices.
Disk reliability can be better monitored. Tapes are also notorious because you can never really be sure that your data is successfully written without a subsequent restore and manual verification. The “tape” in tapes isn’t something that can be monitored using traditional tools like System Center Operations Manager. Further, when multiple tapes are needed for a backup set, the loss of even one can spell complete disaster. Conversely, with disks, every part is always online, making trivial the automated verification of a successful backup.
Users can restore their own files. Once disks make every backup online and always available, it becomes easy to provide mechanisms for users to restore their own data. All you need is some form of client for them to work with. Those restores take no more time than a traditional file copy, and remove your help desk from an otherwise time-consuming restore each and every time Stan from accounting loses his budgeting spreadsheet.
Backups are smaller. Worried you’ll need a ridiculous amount of storage to hold all your backups? Think again. Data on tapes has to be written in a linear fashion, meaning that each file on each server must be slowly and deliberately transferred to the tape medium. While brute-force compression is possible during this process, it still occurs only one file at a time. One cannot, for example, leverage single-instance storage (SIS) technology to reduce the overall backup size. Using disks, SIS makes it possible to eliminate duplicate files and significantly shrink your overall storage requirements.
Backed up data is block-level and delta-based. Disk-based backups are also generally delta-based. This is a completely different architecture than how we used to think of “incremental” backups with tapes. Using disks, the granularity for changes is at the individual block level. This means that only the individual changed block needs to be backed up each time. As a result, if you change a single character in a 50GB file, you need only back up that changed block and not the entire file. The result is an even further reduction in overall storage size.
Backup windows are effectively zero. The mechanism by which block-level changes are captured also means that they’re backed up essentially as they occur. Today’s disk-based backup solutions integrate application logs with Volume Shadow Copy snapshots. Using VSS means that backups occur on snapshot data, reducing the backup window to something exceptionally close to zero. For applications like Exchange, SQL, SharePoint, and others, application logs are then rolled into the most recent VSS backup to ensure recovery with zero data loss.
Now all this seems like waxing philosophical until you realize that this technology is available now in solutions like Microsoft System Center Data Protection Manager 2007 (DPM). Part of the System Center suite, DPM brings reality to the much-desired concept of disk-based backups.
While designed for environments both large and small, DPM is a particularly good fit for jack-of-all-trades environments due to its heavy focus on Microsoft technologies. Your small or midsize environment probably makes fairly heavy use of Microsoft technologies for your core infrastructure needs. If you use Exchange for mail services, SQL Server for databases, SharePoint and traditional file shares for document storage, and Hyper-V or Microsoft Virtual Server for your virtual servers, DPM is designed specifically with these applications in mind. In fact, DPM comes equipped with the necessary VSS writers that enable the online backup and item-level restore of each of these products (see Figure 1) right out of the box.
Figure 1 DPM backs up files and folders as well as specified applications.
I’ll leave the click-by-click instructions to DPM’s documentation. But considering its fundamentally different architecture, a few design tips can come in handy. More details can also be found within Microsoft’s excellent DPM Infrastructure Planning and Design guide.
Your first step with any new DPM installation is to determine what kinds of data you want to protect as well as each data type’s level of protection. Unlike traditional backups, where the best practice has long suggested weekly “full” backups followed by daily “incrementals,” you’ll find DPM to have quite a few more settings to consider when creating its Protection Groups. These Protection Groups are equivalent to a “backup job” within more traditional backup solutions.
The first question has to do with your tolerance for data loss. How much data can you afford to lose? 15 minutes worth? An hour? A day? Equivalent to the industry term “Recovery Point Objective,” this number will define how often you want DPM to complete a backup of changed data. Typically this number will be different depending on the server and workload, with more-critical data such as Exchange mail needing a smaller period of time between backups than a little-used file server.
Unlike most tape-based solutions, DPM can back up changed data as often as every 15 minutes, or as infrequently as once a week. While backups that occur more often might seem like a tax on server resources, remember that only changed data is transferred. This means that more-often backups can be exceptionally quick in their operation, while one-per-week backups can take substantially longer to complete. If you’re concerned about network utilization during business hours, DPM can throttle its bandwidth usage during configured periods (see Figure 2).
Figure 2 Throttling DPM bandwidth usage during business hours.
Question No. 2 relates to the amount of time you need to keep your backups. In the old-school tape paradigm, this quantity of time was often decided based on the number of tapes on hand. More tapes equaled more backups. However, today’s industry and regulatory compliance statutes mandate longer periods for many businesses. In any case, having more backups on hand enables you to move backward and forward in time when users need certain document versions restored.
Again, because disk-based backups only back up data as it changes, storing backups for longer periods of time needn’t necessarily consume substantial disk space. It’s no longer necessary to multiply the number of days of retention by the total disk consumption on each server—a very large number—to compute the amount of required storage space. Deltas are collected and stored along with the original server backup using the same sorts of SIS-based data de-duplication explained earlier.
The resulting space requirements are fairly impressive. As an example, Microsoft’s suggested space allocation for backing up a 12GB file share every 15 minutes for an entire year consumes slightly less than 110GB on disk. DPM automatically calculates its expected storage space needs as you create a Protection Group for each backup job (see Figure 3). This process reserves disk space for future needs, reducing the possibility that disks fill up many months down the road. You can find details on DPM’s storage calculations for file servers and other supported applications here.
Figure 3 Calculating the storage consumption of a Protection Group.
Connected disks can be locally attached or remotely connected using Fibre Channel or iSCSI connections, potentially adding value to existing highly available SAN storage. Connected disks will be converted to dynamic disks, with volumes on those disks converted to simple volumes. Volumes must be empty and unformatted prior to adding them to a DPM storage pool.
So at this point, you might be thinking, “This is great for my daily needs, but what about long-term archival? I’d love to see my tapes go away, but I still need a copy of my data stored off-site in case a tornado takes out my datacenter and all its ‘wonderful’ disks.”
It’s an absolutely valid question, and one that is now supported by DPM 2007. To resolve this problem, DPM can be used in a disk-to-tape architecture, but without all the niceties discussed so far in this article. By using tapes alone, DPM really cannot fulfill its true potential for optimized backups and user-based restorations.
So, to really solve this problem, DPM also supports disk-to-disk-to-tape (D2D2T) configurations as well. As you can see back in Figure 1, a D2D2T configuration enables DPM to provide all its easy restore capabilities, with secondary backups for archival purposes being transferred to tape on a scheduled basis. Environments already invested in tapes and tape drives can still use that existing infrastructure in a more-limited capacity for off-site archival.
Admittedly, calling for the end of tape might be a bit premature, but this future dream isn’t far from reality itself. DPM 2010 is Microsoft’s next version that isn’t far off. This version is expected to include support for disk-to-disk-to-cloud architectures. Here, the tape device is eliminated entirely, with secondary backups scheduled for transfer to cloud-based storage. While still disk-based in nature, these cloud-based storage options can be in sites far removed from your production datacenter. Being geographically separated, cloud-based storage can retain all of the benefits of disk-based backups but in an off-site archival format.
The result is that your data remains easily restorable, as do your archival backups in the case of a disaster. When that disaster strikes, knowing that you can quickly bring back mission-critical storage and servers really helps administrators and business executives sleep well at night.
System Center Data Protection Manger is one solution that got off to a slow start. With a fairly slow adoption by companies, this excellent product didn’t come out of the gate without missing a few key steps. However, its current version—and improved even more in its upcoming DPM 2010 release—overcomes many of those original limitations. If you’re in the market for guaranteeing your backups, reducing wasted time on restorations and protecting your data like no tape can, consider disk-based solutions like DPM for your next IT infrastructure improvement.
Greg Shields is MVP, and a partner at Concentrated Technology. Get more of Shields’ Geek-of-All-Trades tips and tricks at ConcentratedTech.com.