In the case of a lossy failure, there is the potential for the original active and new active copies of a database to diverge, requiring a full reseed of the database on the original active when it recovers. A feature of ESE called lost log resilience (LLR) provides protection of the original copy of the database in order to minimize the need to perform full database reseeds.
In a lossy failure, there is at least one log file missing per storage group (Exx.log), but there also could be additional closed log files missing. This means that if the databases are brought online, data stored on NodeA (the failed node) is different than the data being generated on NodeB. This is referred to as divergence.
Divergence is when a storage group copy has information that is not in the active storage group. Divergence can be in the database or in the log files and can be caused by lossy failures, for example. Administrators can also cause divergence (by performing an offline defrag or hard repair of the active copy).
Divergence is problematic because it means that data has been lost. In particular, database divergence is the worst case because it guarantees the need to reseed, which can be an expensive operation in terms of time and possibly bandwidth. Log file divergence also means data has been lost. However, log file divergence doesn’t necessarily cause database divergence because of LLR.
Remember that the order of write operations of Exchange data is always memory, log file, and then database file. LLR works by delaying writes to the database until the specified number of log generations have been created. LLR delays recent updates to the database file for a short time. The length of time that writes are delayed depends on how quickly logs are being generated.
LLR provides the ability to force database changes to be held in memory until one or more additional log generations are created. In a nutshell, this forces the active database file to remain a few generations behind the log files that are created, thus reducing the likelihood of database divergence between the copies.
Note: |
|---|
|
LLR only runs on the active storage group copy. If you analyze the passive copy, you will see that its database is always up-to-date (in terms of the log files that exist on the passive node).
|
LLR introduces a new marker for log files. From a log replay perspective, there have always been two key markers.
-
Committed The committed marker tells ESE the last log generated. The term is somewhat of a misnomer because it does not mean that the log records contained within the log file are actually written to the database file.
-
Checkpoint The checkpoint marker tells ESE the minimum log file required for replay. All log files prior to (below) the checkpoint contain log records that have been written into the database file.
Exchange 2007 added an additional marker for LLR and divergence detection.
-
Waypoint The waypoint marker tells ESE the maximum log file required for replay. All logs prior to (below) the waypoint are required for recovery as some of those logs contain data that has been committed to the database. All log files after (above) the waypoint have not been committed to the database.
To ensure understanding of these markers, let’s look at an example output of a database that was not shut down cleanly.
Initiating FILE DUMP mode...
Database: priv1.edb
...
State: Dirty Shutdown
Log Required: 2-10 (0x2-0xA)
Log Committed: 0-20 (0x0-0x14)
...
The output of this EDB file tells us the three markers.
-
Committed is log generation 20 (the last log generation listed in “Log Committed”).
-
Checkpoint is log generation 2 (the first log generation listed in “Log Required”).
-
Waypoint is log generation 10 (the last log generation listed in “Log Required”).
This means that log generations 11 through 20 have not been committed to the EDB file. But what exactly does this mean? Since log generations 11 through 20 have not been committed to the database, they can be discarded by LLR. As long as we have log generations 2 through 10, we do not have to perform a database reseed.
The LLR depth value depends on the mailbox server configuration. In CCR environments running SP1, the LLR depth is a value of 10, which means that at any given time, data contained in the last 10 log files have not been committed to the database file. For all other SP1 mailbox servers (SCC, and standalone mailbox servers with or without LCR), the LLR depth is a value of 1.
Note: |
|---|
|
In the RTM version of Exchange 2007, the LLR depth value in CCR was variable as it was dependent on the AutoDatabaseMountDial setting and was equal to AutoDatabaseMountDial+1. If the default setting for AutoDatabaseMountDial was used, the LLR depth for a CCR environment with RTM code was 6+1 or 7. This meant that if you were beyond seven log files behind in replication, not only did the databases not mount on lossy failure, but you also had to perform a database reseed. By removing the dependency on AutoDatabaseMountDial and setting the LLR depth to 10 for CCR in SP1, the likelihood of database reseeds is further reduced, even in scenarios where the databases do not automatically mount.
|
In the absence of user or database activity, ESE now also forces the active log file to close, thus ensuring that if there is data within that log file, it will be replicated to the database copies. The log roll behavior is based on the value of the LLR depth. The log roll mechanism does not generate transaction logs in the absence of user or other database activity. In fact, log roll is designed to occur only when there is a partially filled log. To calculate when log roll should occur (provided the conditions for log rolling are met), the system uses the following formula:
[15 (minutes) ÷ LLR Depth value] = Frequency of log roll activity (in minutes)
The following table lists the maximum number of transaction log files that will be generated each day by an idle storage group as a result of log roll activity.
Maximum number of logs generated each day due to log roll activity
|
Mailbox server configuration
|
Maximum number of transaction logs generated per day by an idle storage group
|
|---|
|
Stand-alone (with or without LCR)
Single copy cluster
|
96
|
|
Cluster continuous replication
|
960
|
Consider a CCR environment where NodeA owns the clustered mailbox server and NodeB is the passive node. NodeA is generating and closing log files and NodeB is copying the closed log files and replaying them into the passive copy of the database.
NodeA generating log files and shipping them to NodeB.gif)
Now let’s assume NodeA fails. In this case, we can see that NodeA created generation 4 and generation 5 log files for storage group 1, but neither log file was copied to NodeB before NodeA failed.
NodeA has failed and NodeB begins generating log files as the new active node, starting with generation 4.gif)
As a result of the failure, the following will occur on NodeB:
-
The Replication service will attempt to copy the missing log files from NodeA in order to prevent a lossy activation of the storage groups.
-
If the copy attempt fails, the Replication service will calculate the log loss by subtracting LastLogInspected from LastLogGenerated for each storage group and comparing this value with the clustered mailbox server’s AutoDatabaseMountDial value. If the difference is less than the AutoDatabaseMountDial value, then the storage group will mount. If the difference is more than the AutoDatabaseMountDial value, then the storage group will not mount.
Note: |
|---|
|
If the databases are beyond the AutoDatabaseMountDial value in terms of missing logs, they will not automatically mount. In this scenario, the Replication service will "wake up" every 30 seconds and try to contact the passive node (the original active node that failed) to copy the missing log files. If it can copy enough log files to reduce the "lossy-ness" to an acceptable amount, then the database will come online. Also, there is an option to specify a time by which you want to force the databases to mount via the setting ForcedDatabaseMountAfter, however this feature should only be used with extreme caution as it can cause database divergence.
|
For each storage group that mounts, the Replication service will initiate a transport dumpster redelivery request to recover recent messages that had been submitted to the clustered mailbox server immediately before the time of failure. This is discussed in detail in the "Transport Dumpster" section.
Also note that as a result of storage group 1 being mounted on NodeB, the storage group will generate log files and will continue with the generation sequence of the log stream, starting with log generation 4. As can be seen from the figure above, storage group 1 generates generation 4 through generation 6 log files. How Exchange deals with the log streams that exist on NodeA and NodeB when NodeA returns is discussed in the "Incremental Reseed" section.
Return to top