Q What should my average disk queue length be? For example, if I have 20 separate physical spindles off of Storage Area Network (SAN) with RAID 01 configuration, how do I calculate average disk queue length? Is it Avg. Disk Queue Length/20 or Avg. Disk Queue Length/2?
A First of all, you should look at the disk latency before wasting any time on average disk queue length in a SAN environment. But it really does depend on what you are trying to figure out. You'll see why in a moment.
The explanation of this counter (from Perfmon) is "Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval." This is either a physical disk or logical disk counter, so the number that you get depends on how the underlying storage is presented to the operating system.
Let's look at your case. You have 20 spindles in a RAID 01 configuration, which means they are striped and mirrored (or mirrored and striped depending on how you read 01 or 10). The key point about your storage array is that there are 10 spindles in the stripe set.
But I'm missing essential information, such as what the stripe size is, how big your writes are, and what kind of I/O you're issuing (read, write, sequential, or random).
Ignoring the missing information for now, if the Avg. Disk Queue Length says 10, then the OS queued 10 I/Os to the disk array. Theoretically, that could be one I/O to each of the 10 mirrored sets in the stripe, or it could have been 10 I/Os all to the one disk. You have no way of knowing which it was.
This is where the missing information comes in. Let's assume your stripe size is 64KB, your write size is 8KB, and you're going to do a whole chunk of sequential writes. This is a typical scenario for SQL Server storage activity. In this case, there is a good chance that eight of the I/Os all went to the first disk, and the next two I/Os went to the next disk. So, if you're trying to work out the disk queue length per disk for this scenario, it is 8 for the first disk, 2 for the second disk, and 0 for the remaining eight disks in the array.
Now let's change the scenario to a theoretical stripe size of 8KB and a write block size of 64KB-and keep the disk queue length at 10. In this case, each 64KB block gets spread over 8 disks, so one I/O is written to 8 disks and the 10 queued I/Os are spread across 80 disk writes over all 10 disks in the array. If you're trying to calculate the disk queue length per disk in the array, it's going to be 8 for each disk in the array.
Let's be realistic and add another level of uncertainty into the picture. In most cases, your SAN storage is going to be connected to the server using one or more HBAs in the server, a number of fibers to connect the HBA to the SAN, a number of ports on the front end of the SAN, and perhaps some kind of fiber switch in the fabric between the server and the SAN. Then we get into the internal architecture of the buses inside the SAN, and how the disks are connected to the ports on the front of the SAN.
Any queuing you are seeing reported in Perfmon can be a symptom of high latency or queuing at any of these points between where the OS measures its disk queue length and the surface of the disks. This is why you should look at the latency and base any decisions on that counter rather than the average disk queue length.
Q I'm using transactional replication, and I know that many rows have been manually changed in a table at the subscriber, so I am getting errors when the publisher is trying to update a row that no longer exists in the subscriber.
I need to know if there is any way through replication to just re-initialize this one table from the publisher instead of reapplying a whole snapshot again. I looked into the tablediff function, which looks like it will do what I want, but I wonder how it interacts with replication.
For example, does tablediff take a snapshot-in-time of the publisher table and compare it with the subscription equivalent? Do I need to stop replication to use the tablediff utility to ensure the consistency of the data? What else do I need to know?
A First of all, tablediff does not take a literal snapshot of either the publishing or subscribing table. Regarding your particular scenario, there are a couple of options you could look at.
The first one would be to temporarily stop replication and run the utility. If you're worried about users attempting to make modifications to the data, you can use the -sourcelocked and -destinationlocked parameters, which will take out an exclusive lock on both tables while the utility is running. If this is unacceptable, another option would be to look at the -rc and -ri parameters while leaving replication running. This will run tablediff once and then run it consecutively on the errors it detects, eliminating errors which could be due to replication propagation delays. Be warned, however, that with this option, depending on your replication delay you may not necessarily catch all of the rows that have changed on the subscriber.
Thanks to the following Microsoft IT pros for their technical expertise: Sunil Agarwal, Chad Boyd, David Browne, Gilles Comeau, Emmanuel Dreux, Amanda Foote, Matt Hollingsworth, Paul Mestemaker, Uttam Parui, Paul Randal, Dennis Tighe, and Steven Wort.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.