
Tasks Related to Storage Groups and Databases in a Clustered Mailbox Server
Administrative tasks associated with the storage groups and databases in a clustered mailbox server in a CCR environment include the following:
-
Moving the location of storage group files or a database
-
Viewing the status of storage group copies
-
Mounting and dismounting databases
-
Verifying the integrity of a storage group copy
-
Recovering from corruption in a production storage group or a storage group copy
-
Restoring CCR after experiencing a failure or some form of data corruption
Except for the recovery storage group, which is a special type of storage group, all storage groups and databases in a CCR environment are automatically enabled for continuous replication. Although replication and replay can be suspended, disabling continuous replication for one or more storage groups in a CCR environment is not possible because this would allow an outage to prevent access to particular databases.
When you create a new storage group in a CCR environment, seeding of the copy of the database on the passive node should occur automatically. If for some reason seeding does not automatically occur, you must manually seed the database copy. For detailed steps about how to seed a database copy, see How to Seed a Cluster Continuous Replication Copy.
Moving the Location of Storage Group Files or a Database
It may be necessary to move the location of storage group files or the location of a database in a CCR environment. The time it takes to move the file locations depends on the size of the database being moved, the number of transaction log files being moved, and the performance characteristics of the storage. During any move, the database will be dismounted.
In a CCR environment, relocating a storage group requires that both copies be relocated in a consistent way because the location of files on both the active node and the passive node must be the same. Before a storage group or its database can be moved, you must dismount the database and suspend replication. For the active copy, you can accomplish this by using the Dismount-Database cmdlet in the Exchange Management Shell. For the Microsoft Exchange Replication service, use the Suspend-StorageGroupCopy cmdlet and the Resume-StorageGroupCopy cmdlet.
Note: |
|---|
|
The Microsoft Exchange Replication service is constantly monitoring both the files in the copy location and the logs on the active node. Thus, if you manipulate active logs in any way, you must suspend activity of that storage group by using the Suspend-StorageGroupCopy cmdlet, which halts replication.
|
For detailed steps about how to move the location of storage group files in a CCR environment, see How to Move a Storage Group in a CCR Environment. For detailed steps about how to move the location of a database in a CCR environment, see How to Move a Database in a CCR Environment.
Viewing the Status of Storage Group Copies
In the release to manufacturing (RTM) version of Microsoft Exchange 2007, you can only view CCR status information by using the Exchange Management Shell. In Exchange 2007 SP1, some of the status information listed in the following table can be viewed in the Exchange Management Console.
Exchange 2007 publishes a variety of status information for storage group copies. The following table describes the status information that is available. In the following table, the attributes are listed in the order in which they appear in the complete output of the Get-StorageGroupCopyStatus cmdlet. For detailed steps about viewing status information, see How to View the Status of a Storage Group in a CCR Environment.
Status information available for CCR-enabled storage groups
|
Attribute
|
Description
|
| Identity | Identity of the queried storage group. This attribute gives the <ServerName>\<StorageGroupName>. |
| StorageGroupName | Name of the queried storage group. This attribute gives the storage group name. |
| SummaryCopyStatus | Current overall status of the passive copy. Possible values are: - Not Supported Current configuration does not support local continuous replication (LCR).
- Disabled Storage group does not have a configured copy. There is no passive node configured for this clustered mailbox server.
- Failed Verification failed, or the storage group is only partially configured for CCR.
- Seeding Full database seeding is in progress.
- Stopped Transaction log copying is stopped.
- Suspended Transaction log copying and replay is stopped.
- Healthy Passive copy is healthy and normal, and nothing is blocking or blocked.
Exchange 2007 SP1 adds the following additional status values: - Initializing No log files have been closed and the Microsoft Exchange Replication service is waiting for a closed log file to replicate. This status typically occurs when the Microsoft Exchange Replication service has just been started.
- Service Down The Microsoft Exchange Replication service is not running or cannot be contacted.
- Resynchronizing The Microsoft Exchange Replication service is performing an incremental reseed of the storage group copy.
|
| Failed | Verification of the database or logs, which identified an inconsistency that prevents replication. Alternatively, there is a configuration or access problem with the active or passive copy. Possible values are True and False. |
| FailedMessage | Textual message that identifies the condition that caused replication to fail. It may not be the only replication problem area. |
| Seeding | Indicates that seeding is in progress. Possible values are True and False. |
| Suspend | Indicates that replication has been halted for the passive copy. This state prevents the database from advancing, and logs from being copied. Possible values are True and False. |
| SuspendComment | Optional comment area in which an administrator can provide a reason or note as to why replication activity was halted. |
| CopySuspend | Indicates that log copying has been halted for the passive copy. This prevents the log copy directory from changing. Possible values are True and False. |
| CopySuspendComment | Optional administrator comment providing a reason or note as to why log copy activity was halted. |
| CopyQueueLength | Number of transaction log files waiting to be copied to the passive copy log file folder. A copy is not considered complete until it has been checked for corruption. |
| ReplayQueueLength | Number of transaction log files waiting to be replayed into the passive copy. |
| LatestAvailableLogTime | Time stamp on the source storage group of the most recently detected new transaction log file. |
| LastCopyNotificationedLogTime | Time associated with the last new log generated by the active storage group and known to the copy. |
| LastCopiedLogTime | Time stamp on the source storage group of the last successful copy of a transaction log file. |
| LastInspectedLogTime | Time stamp on the target storage group of the last successful inspection of a transaction log file. |
| LastReplayedLogTime | Time stamp on the target storage group of the last successful replay of a transaction log file. |
| LastLogGenerated | Last log generation number that was known to be generated on the active copy of the storage group. |
| LastLogCopied | Last log generation number that was successfully copied to the passive copy log folder. |
| LastLogNotified | Last log generation number that was generated by the active storage group and known to the copy. |
| LastLogInspected | Last log generation number that was inspected for consistency and corruption. |
| LastLogReplayed | Last log generation number that was successfully replayed into the passive copy of the database. |
| LatestFullBackupTime | Time of the last full backup. |
| LastestIncrementalBackupTime | Time of the last incremental backup. |
| SnapshotBackup | Indicates whether the last full backup taken was a legacy streaming backup or a Volume Shadow Copy Service (VSS) backup snapshot. |
| SnapshotLatestFullBackup | Time of the last snapshot full backup. |
| SnapshotLatestIncrementalBackup | Time of the last snapshot incremental backup. |
| SnapshotLatestDifferentialBackup | Time of the last snapshot differential backup. |
| SnapshotLatestCopyBackup | Time of the last snapshot copy backup. |
| OutstandingDumpsterRequests | Outstanding requests and the time range (low-high) for the outstanding requests. |
| DumpsterStatistics | Transport dumpster statistics from all accessible Hub Transport servers. This value is displayed only when the DumpsterStatistics parameter is used with the Get-StorageGroupCopyStatus command. |
| DumpsterStatisticsNotAvailable | List of inaccessible Hub Transport servers. |
You can quickly assess the health of a storage group copy by looking at the values for SummaryCopyStatus, CopyQueueLength, ReplayQueueLength, and LastInspectedLogTime. These attributes show whether the storage group copy is functioning correctly and whether the storage group copy is relatively up to date in both copying and replaying logs. If the following conditions occur, you should determine the cause and correct the problem:
-
Copy is not in a healthy state.
-
Copy queue length is more than 5.
-
Replay queue length is more than 20.
-
Last inspected log time is not a recent time. Inactivity on the storage group could cause this situation, but it could also indicate the Microsoft Exchange Replication service is stopped.
You can calculate the two queue numbers in units of time as follows:
-
Copy queue in time = LatestAvailableLogTime – LastCopiedLogTime
-
Replay queue in time = LatestCopiedLogTime – LastInspectedLogTime
The replay queue length and copy queue length values are available as performance counters. They are the CopyQueueLength and ReplayQueueLength performance counters under the Microsoft Exchange Replication service performance object.
There are some rare scenarios where the replication status can be misleading. The following is a list of those scenarios:
-
A storage group that is not active (that is, not changing) can report as healthy when it might not be healthy. This situation could occur because the unhealthy condition could not be detected until a log is replayed.
-
During replication initialization, the replication status is being evaluated and may not be accurate. When the initialization completes, the status is updated.
-
The value of the LastLogGenerated field can be wrong when a database is dismounted. However, all logs with end-user content are replicated if the storage group copy is replicating.
-
When there are one or more missing logs in the middle of a log stream, the passive copy continues to try to recover. In doing so, the replication status switches between failed and healthy states. The replay and copy queues will continue to grow.
-
Under some very rare conditions, a log can be successfully verified but it can still fail to replay. In this situation, the system will alternate between failed and healthy states as it attempts to recover. The replay and copy queues will continue to grow.
Note: |
|---|
|
In Exchange 2007 SP1, you can also use a new cmdlet called Test-ReplicationHealth to verify the health and status of storage groups enabled for continuous replication. For more information about the Test-ReplicationHealth cmdlet, see Test-ReplicationHealth.
|
Mounting and Dismounting Databases
It may occasionally be necessary to mount or dismount databases in a CCR environment. This could be required to perform a reconfiguration or to correct issues with the server or database. When the database is dismounted, it is frozen from further changes. Neither the database nor the log files are changed while the database is dismounted.
For more information about mounting databases in a CCR environment, see How to Mount a Database in a CCR Environment. For more information about dismounting databases in a CCR environment, see How to Dismount a Database in a CCR Environment.
Verifying the Integrity of a Storage Group Copy
When you use CCR, we recommend that you verify the integrity of the passive copy periodically by running a physical consistency check against the database and transaction log files. A physical consistency check examines the transaction logs and database files for corruption. You can perform the check by using Exchange Server Database Utilities (Eseutil.exe). For detailed steps about how to use Eseutil to check the transaction logs and database files for physical corruption, see How to Verify a Cluster Continuous Replication Copy.
Note: |
|---|
|
Before you run a physical consistency check against a database, you must temporarily suspend replication activity against the storage group copy. You can suspend transaction log replay activity by using the Suspend-StorageGroupCopy cmdlet in the Exchange Management Shell. When the consistency check has completed, you can resume transaction log replay activity by using the Resume-StorageGroupCopy cmdlet.
|
Recovering from Corruption in a CCR Environment
CCR enables you to recover from corruption or failures in a production storage group by initiating a scheduled outage. If the log files are not corrupt, no data loss should occur because of the recovery. However, if the log files are not available, the recovery can only bring the storage group back to a point in time that is consistent with the last set of changes that the copy received that are not corrupted. An additional constraint is that there cannot be any missing or corrupted change data earlier than that point in time.
For detailed steps that explain how to recover from corruption or failures in a CCR environment, see the following topics:
Restoring CCR After a Failure or Corruption Occurs
CCR provides functionality to automatically recover after a failure. However, there are still cases where manual recovery is required. Those cases are:
- Database file is corrupted on the passive copy For detailed steps that explain how to restore CCR after database corruption occurs, see How to Restore After Database Corruption Occurs.
- Database or a log volume has failed on the passive copy For detailed steps that explain how to restore CCR after a volume failure occurs, see How to Restore After a Volume Failure.
- Database has failed or is diverged CCR detects and reports when database divergence has occurred as a result of a failure. In general, this occurs when a database copy is made available and the failed database copy has more changes than the acceptable automatic mount criteria allows for. For detailed steps that explain how to restore CCR after a database failure or divergence occurs, see How to Restore CCR Functionality After a Failure or Divergence.