SQL Q&A: Inside Clustering, Mysterious Hangs, the SA Account, and More

Article
08/31/2016

SQL Q&AInside Clustering, Mysterious Hangs, the SA Account, and More

Edited by Nancy Michell

Q I need to gain a better understanding of how clustering works. Our environment will consist of 64-bit Windows Server® 2003 running SQL Server™ 2005, a SQL Server Reporting Services (SSRS) Web farm (a report server scale-out deployment), an SSRS TempDB Catalog server and SQL Server that pumps data from a third-party database via a linked server and will store the data for SSRS.

We want a 3-node active/active/passive cluster. Node1 would be active and would store the data pumped from the third-party database. Node2 would be active and would store the SSRS Catalog. Node3 would be passive and be a failover for either Node1 or Node2. Can you help?

A Unfortunately, too many people are misled by the terms active/active and active/passive in reference to SQL Server clustering. They think that SQL clustering may support "scale out" of one database or SQL instance across multiple servers. That's just not the case. On SQL Server, there is no such thing as an active/active database or instance. An "instance" is an installation of SQL Server with corresponding databases. Our clustering per SQL Server instance is active (1) to passive (n) always (note that the value of n is anywhere from 1 to 7 depending on your version of SQL Server). That's why it's called failover clustering.

Once that is understood, people can start to consider installing multiple instances of failover clustering on a set of nodes. For example three physical servers all using shared disks could have one instance that is active by default on Node 1 and a second instance that is active by default on Node 2, and both can fail to Node 3. The instances are completely separate; they do not share data and are not active/active. They are both active/passive and both share the same failover instance. If both instances fail to Node 3, then the challenge over time is knowing whether it will buckle under the load. By design, failover is intended to rely on equal processing power for the failover. If peak load requires two nodes for processing under normal operation, it is unlikely that Node 3 would survive under the peak load normally assigned to two nodes.

That said, given the relative cost of hardware capable of running a cluster, we understand how people would weigh the likelihood of both principal nodes failing simultaneously and forcing the entire load to one node. With that consideration in mind, they may make the business decision to assume the risk rather than purchase 100 percent failover capacity.

Fortunately, there's some good news: SQL Server 2005 offers many more options for high availability (HA), including alternatives that can fail over more quickly than a cluster and can even mean duplicate copies of data (clustering relies on a single SAN). The options include mirroring, peer-to-peer replication, and more. With those new alternatives, we have a lot more options to satisfy all sorts of needs, including some that may combine a number of HA features.

The Microsoft® Cluster Configuration Validation Wizard (ClusPrep), now available for download, replaces what used to be the Hardware Certification List (HCL) testing, which could take months to validate a complete configuration to deem it "supportable" under clustering. This puts the hardware validation tool into the hands of the DBA, further driving down the cost (in dollars and time) of getting certified hardware in place. It may even make it possible to validate and deploy heterogeneous hardware within one cluster node set.

Q A delete procedure on one of my machines appears to be hanging after 12 hours. It's not blocked. Looking at the slowest query plan reveals a trigger running for 87,327 seconds, so I assume the procedure is hanging in that trigger. How can I see exactly which statement is hanging?

A It is quite possible that a loop within the trigger is not exiting for a variety of reasons. If you're hanging for a long time and you want to see which statement is executing, run the code in Figure 1. It will tell you which statement is currently running, and this should be the one that is causing your machine to hang.

Figure 1 Find currently executing statement

-- Look at the current statement being run:
-- Put results to text (Ctrl + T)
DECLARE @Handle binary(20), 
        @start int, 
        @end int,
        @SPID int

SET    @SPID = spid

SELECT @Handle = sql_handle, 
        @start = stmt_start, 
        @end = stmt_end 
FROM Master..sysProcesses(NOLOCK) 
WHERE SPID = @SPID

IF NOT EXISTS (SELECT * FROM ::fn_get_sql(@Handle))PRINT 'Handle not found in cache'
ELSE
   SELECT 'Current Statement'= substring(text, (@start + 2)/2, CASE @end WHEN -1 THEN (datalength(text))
       ELSE (@end -@start + 2)/2 END)
       FROM ::fn_get_sql(@Handle)

Q I need to support transactional replication through a firewall. The publisher and distributor are on the outside of the firewall and the subscriber is on the inside. The subscriber is set to listen on 1433 and these are my machine names: Publisher: PUBMACHINE, Distributor: DISTMACHINE, Subscriber: SUBMACHINE. What ports do I need to open to allow the initial snapshot and the publication push to succeed?

A If you are using a push subscription, opening the SQL Server port (1433 in your case) should be sufficient as the distribution agent will be running on the distributor machine (outside the firewall) and will likely have local access to the snapshot files generated by the snapshot agent. But if you are using a pull subscription, the distribution agent running on the subscriber machine will need to access the snapshot files through the firewall somehow. Here are the options that you can consider.

Assuming that the snapshot files are already accessible from a file share outside of the firewall, you can open up the Windows® file sharing port(s) through the firewall so the distribution agent running inside the firewall can access the snapshot files located outside (though beware of the security implications that this may have for other parts of your infrastructure). Note that if you have configured a local path as the default snapshot location (SSMS default), you may need to use the /AltSnapshotFolder option of the distribution agent to override the snapshot file pickup location.

You can also configure replication to use FTP for transferring snapshot files (and you would need to open up port 21 for that).

Q I'd like to know if there are any drawbacks to disabling the SA account in SQL Server 2005 and if disabling SA adds true security value. Is there a whitepaper on this question?

A On a new SQL Server 2005 installation when Mixed Mode is not enabled, the SA account is disabled by default and a random password is generated for it. You can also disable it yourself. There isn't a whitepaper on it, but disabling and renaming logins is discussed in a best practices paper.

If you want to protect yourself from attempts to break the SA account, you can rename it as well. Just remember, if you enable a disabled account, you should set a new password for it.

To answer your question about whether this offers true security, remember that the added security of disabling the account comes from the fact that password guessing is futile while the account is disabled. No matter how much time a hacker or virus has, a brute force attack against a locked account will not succeed. Renaming or disabling SA will break applications that are dependent on using the SA account for their connectivity. Finding and then fixing or eliminating these applications should be considered as a priority anyway. As mentioned, the account cannot be used to connect to the database until it is reenabled. Plus, because the authentication process fails earlier, a failed attempt will take a lesser toll on the attacked system.

Q One of my larger online transaction processing (OLTP) databases has a log file that is twice the size of the data file. I've tried using the following commands to get the log file down to a reasonable size, but I need to reduce it further:

backup database syslogs to backupfile
DBCC SHRINKFILE (syslogs_log)

A You should change your backup database to a backup log statement. Alternatively you could put your database into simple recovery mode and issue the shrinkfile statement. After the shrink of the log is finished, set the database to its previous recovery model and back up the database. If it's still not shrinking, check to make sure you have no open transactions (use dbcc opentran). The following Knowledge Base article provides more information: support.microsoft.com/kb/907511.

Q If failover occurs during a scheduled SQL Server Agent job, what happens to that job after failover? Do I have to restart it manually?

A Yes, you will need to start it manually if you don't have some other process in place. If you don't want to have to restart jobs manually, you could write a script that updates a table upon job completion. If the value = 1, the job has been run; any other value indicates that the job was not completed and a second job will come in later and issue the start command. So while the job needs to be rerun if a failover occurs during its pass, by writing a script you can alleviate some of the worry for those crucial, middle-of-the-night jobs that absolutely must be completed before the next business day.

Tip: Upgrading and DBCC UPDATEUSAGE

Are you upgrading from SQL Server 2000 to SQL Server 2005?

If so, make sure you run DBCC UPDATEUSAGE right after your databases are upgraded.

DBCC UPDATEUSAGE reports and corrects pages and row count inaccuracies in the catalog views. These inaccuracies need to be corrected because they may cause incorrect space usage reports to be returned by the sp_spaceused system stored procedure. In SQL Server 2005, these values are always maintained correctly, so these databases should never experience incorrect counts. However, databases upgraded to SQL Server 2005 may contain invalid counts, so you should run DBCC UPDATEUSAGE after upgrading.

Here's how DBCC UPDATEUSAGE works. It corrects the rows, used pages, reserved pages, leaf pages, and data page counts for each partition in a table or index. If there are no inaccuracies in the system tables, DBCC UPDATEUSAGE returns no data. If inaccuracies are found and corrected and you have not used WITH NO_INFOMSGS, DBCC UPDATEUSAGE returns the rows and columns being updated in the system tables.

DBCC UPDATEUSAGE can also be used to synchronize space-usage counters. Because DBCC UPDATEUSAGE can take some time to run on large tables or databases, it should typically be used only when you suspect incorrect values are being returned by sp_spaceused. Note that sp_spaceused accepts an optional parameter to run DBCC UPDATEUSAGE before returning space information for the table or index.

DBCC CHECKDB has been enhanced in SQL Server 2005 to detect when page or row counts become negative. When negatives are detected, DBCC CHECKDB will output a warning and a recommendation to run DBCC UPDATEUSAGE to address the issue. Although it may appear as if upgrading the database to SQL Server 2005 caused this problem, be assured that the invalid counts existed before the upgrade procedure.

As an example, here's how you would update page or row counts or both for all objects in the current database. The following command specifies 0 for the database name and DBCC UPDATEUSAGE reports updated information for the current database:

DBCC UPDATEUSAGE (0);
GO

To update page or row counts or both for, say, AdventureWorks, and also to suppress informational messages, you would run a command similar the following, which specifies AdventureWorks as the database name and then suppresses all informational messages:

USE AdventureWorks;
GO
DBCC UPDATEUSAGE ('AdventureWorks') WITH NO_INFOMSGS; GO

For more information, search for DBCC UpdateUsage in SQL Server Books Online.

**Thanks to the following Microsoft IT pros for their technical expertise:**Ken Adamson, Sunil Agarwal, Siggi Bjarnason, Shaun Cox, Laurentiu Cristofor, Ernie DeVore, Michael Epprecht, Lucien Kleijkers, Raymond Mak, Chat Mishra (MSLI), Niraj Nagrani, Rick Salkind, Jacco Schalkwijk, Vijay Sirohi, Vijay Tandra Sistla, Matthew Stephen, and Buck Woody. Thanks to Saleem Hakani for this month's tip.

Additional resources