SQL Q&AInside Clustering, Mysterious Hangs, the SA Account, and More
Edited by Nancy Michell
Q I need to gain a better understanding of how clustering works. Our environment will consist of 64-bit Windows Server® 2003 running SQL Server™ 2005, a SQL Server Reporting Services (SSRS) Web farm (a report server scale-out deployment), an SSRS TempDB Catalog server and SQL Server that pumps data from a third-party database via a linked server and will store the data for SSRS.
We want a 3-node active/active/passive cluster. Node1 would be active and would store the data pumped from the third-party database. Node2 would be active and would store the SSRS Catalog. Node3 would be passive and be a failover for either Node1 or Node2. Can you help?
A Unfortunately, too many people are misled by the terms active/active and active/passive in reference to SQL Server clustering. They think that SQL clustering may support "scale out" of one database or SQL instance across multiple servers. That's just not the case. On SQL Server, there is no such thing as an active/active database or instance. An "instance" is an installation of SQL Server with corresponding databases. Our clustering per SQL Server instance is active (1) to passive (n) always (note that the value of n is anywhere from 1 to 7 depending on your version of SQL Server). That's why it's called failover clustering.
Once that is understood, people can start to consider installing multiple instances of failover clustering on a set of nodes. For example three physical servers all using shared disks could have one instance that is active by default on Node 1 and a second instance that is active by default on Node 2, and both can fail to Node 3. The instances are completely separate; they do not share data and are not active/active. They are both active/passive and both share the same failover instance. If both instances fail to Node 3, then the challenge over time is knowing whether it will buckle under the load. By design, failover is intended to rely on equal processing power for the failover. If peak load requires two nodes for processing under normal operation, it is unlikely that Node 3 would survive under the peak load normally assigned to two nodes.
That said, given the relative cost of hardware capable of running a cluster, we understand how people would weigh the likelihood of both principal nodes failing simultaneously and forcing the entire load to one node. With that consideration in mind, they may make the business decision to assume the risk rather than purchase 100 percent failover capacity.
Fortunately, there's some good news: SQL Server 2005 offers many more options for high availability (HA), including alternatives that can fail over more quickly than a cluster and can even mean duplicate copies of data (clustering relies on a single SAN). The options include mirroring, peer-to-peer replication, and more. With those new alternatives, we have a lot more options to satisfy all sorts of needs, including some that may combine a number of HA features.
Cluster Configuration Validation Wizard (ClusPrep), now available for
, replaces what used to be the Hardware Certification List (HCL) testing, which could take months to validate a complete configuration to deem it "supportable" under clustering. This puts the hardware validation tool into the hands of the DBA, further driving down the cost (in dollars and time) of getting certified hardware in place. It may even make it possible to validate and deploy heterogeneous hardware within one cluster node set.
Q A delete procedure on one of my machines appears to be hanging after 12 hours. It's not blocked. Looking at the slowest query plan reveals a trigger running for 87,327 seconds, so I assume the procedure is hanging in that trigger. How can I see exactly which statement is hanging?
A It is quite possible that a loop within the trigger is not exiting for a variety of reasons. If you're hanging for a long time and you want to see which statement is executing, run the code in Figure 1. It will tell you which statement is currently running, and this should be the one that is causing your machine to hang.
Figure 1 Find currently executing statement