Data Archiving, Linked Servers, Measuring Bandwidth, and More
Edited by Nancy Michell
Q I need a complete data-archiving strategy for my group. My current approach is based on the standard model of backing up data and then running utilities that will purge historical data. If the purged data needs to be retrieved, the environment and database are restored.
I'm looking for an archiving solution that can be done in stages. For stage one, data would be purged from the database but would still be accessible in real time. In other words, the data is removed so the size of production database is manageable, but end users are still able to access the data. In stage two, data would be completely removed and a restore would be required to access it. How can I do this?
A The easiest thing to do would be to restore the archive copy of the database, then modify the application to access the new, trimmed database or the old untrimmed one if the user so requests. As a result, you'll have a copy of the database for each purge period.
If data changes are slow and most of the data does not change between purges, you may end up with a lot of overlap and wasted space. On the other hand, since it is accessed infrequently you can probably get away with placing this archived data on a few large, slower, inexpensive drives (instead of many small, fast, expensive ones). If data gets wholly refreshed each period, this will suit you well. This process should be quite easy to manage and nearly foolproof.
Also, look into the possibility of using database snapshots-they offer a way to create a point-in-time replica before trimming the old database
. While snapshots may work well in some scenarios, they also have numerous implications. Database snapshots are "copy on write" and continuously maintain changes between the original database and the point in time when the snapshot is taken. If there are a lot of changes from the original, the snapshot can become quite large.
In some sense the database snapshot is the opposite of backup and restore. The former makes sense when you anticipate only a few changes since it only stores deltas, while the latter is more suitable when you anticipate a lot of changes. In this case, the benefit of "copy on write" space savings evaporates while the difficulty of managing snapshots persists.
Another option is to use table partitioning
. It allows you to take particular tables, designate key value ranges as belonging to different partitions, and then place those partitions on different filegroups. Once you have control over which data goes into which filegroup within one table, you can exercise better control over associated costs. You can choose which filegroups to place on which physical drives (see the earlier note about cheap vs. expensive drives) and which filegroups should be backed up and how frequently. If you make those filegroups read-only you can also use partial backups
Q My company is looking to implement and host an application that is expected to serve approximately 5,000 of our customers. We would like to implement one database per customer to insure confidentiality. Would the resources of the server be less efficiently used for a given data and index size, number of user connections, and workload if it was 1 database or 5,000?
A No matter how large your staff may be, maintaining the same code base between the databases would be difficult. Fix one bug in the database code and you have to roll it out 5,000 times. In addition, the database recovery times would be horrendous, as would be the time taken to do tasks like backups and database consistency checking. Think about it this way-does a bank have a separate database on their host for each customer who has a checking account?
SQL Server™ can create many, many databases, but that's not always the best approach. Just think of the work required to perform one transaction in each database in the same period of time; each database has its own transaction log. As we all know, one key to online transaction processing (OLTP) performance is maintaining an environment where the transaction log takes advantage of sequential writes to its underlying disk. If you want to perform transactions in these databases rather than just reading from them, don't use this approach.
Querying a Linked Server
Q I am developing a data warehouse on SQL Server 2005 Service Pack 1 (SP1). Our source server is on a partner's domain and the destination server is on our own domain. We are fetching the records from the source through a linked server. When we execute a query from the source with joins between the source tables, the query is executed very quickly. However, whenever we have even a simple join with our destination tables, the query takes forever. Why?
A Avoiding cross-server joins is a best practice, especially across a domain boundary. That said, based on your particular results this issue may occur because your partners trust your domain, but you don't trust the domain of your partners. Try refactoring your query to run it from the source, and then copy the results over to the destination. This may well give you a better outcome.
Q Are there any best practices for measuring the bandwidth used for database mirroring? I understand there's a requirement that it has to be about three times the log-generating rate, but if the demand is not heavy, is it possible to do database mirroring over a broadband connection over the Web?
A Besides raw throughput, you need to consider the round-trip message latency (more critical if safety=full) and also the link reliability. Database mirroring is optimized for a reliable network connection so there can be a fair amount of overhead on reconnection. If the broadband connection is unsteady (dropped connections or long delays) you may not be able to maintain a connection. For more help, check out this whitepaper
on mirroring performance considerations.
Thanks to the following Microsoft IT pros for their technical expertise: Denis Altudov, Jayashree Anand, Gus Apostol, Bhavana Bartholf, Mike Blaszczak, Dibyendu Chakraverty, Shaun Cox, Laurentiu Cristofor, Ernie DeVore, Michael Epprecht, Nakul Garg, Phil Hummel, Steve Lindell, Kaloian Manassiev, Al Noel, Ward Pond, Paul Randal, Remus Rusanu, Mike Shelton, Christy Sutton, Kevin Tsai, Peter Ty, Val Wittenberg, Steven Wort, Kalyan Yella, and Frankie Yuen.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited