Automatic Page Repair During a Database Mirroring Session

Beginning with SQL Server 2008, a database mirroring partner tries to automatically recover from corrupted pages on the mirror database by resolving certain types of errors that prevent reading a data page. The partner that is unable to read a page requests a fresh copy from the other partner. If this request succeeds, the unreadable page is replaced by the copy, which usually resolves the error.

Note

Automatic page repair by database mirroring partners differs from DBCC repair. All of the data is preserved by an automatic page repair. In contrast, correcting errors by using the DBCC REPAIR_ALLOW_DATA_LOSS option might require that some pages, and therefore data, be deleted.

Error Types that Cause an Automatic Page-Repair Attempt

Database mirroring automatic page repair tries to repair only pages in a data file on which an operation has failed for one of the errors listed in the following table.

Error number

Description

Instances that cause automatic page-repair attempt

823

Action is taken only if the operating system performed a cyclic redundancy check (CRC) that failed on the data.

ERROR_CRC. The operating-system value for this error is 23.

824

Logical errors.

Logical data errors, such as torn write or bad page checksum.

829

A page has been marked as restore pending.

All.

To view recent 823 CRC errors and 824 errors, see the suspect_pages table in the msdb database.

Page Types That Cannot be Automatically Repaired

The following control page types cannot be repaired by database mirroring:

  • File header page (page ID 0).

  • Page 9 (the database boot page).

  • Allocation pages: Global Allocation Map (GAM) pages, Shared Global Allocation Map (SGAM) pages, and Page Free Space (PFS) pages.

Handling I/O Errors on the Principal Database

On the principal database, automatic page repair is tried only when the node is in the SYNCHRONIZED state and the principal server is still sending log records to the mirror server. The basic sequence of actions in an automatic page-repair attempt are as follows:

  1. When a read error occurs on a data page in the principal database, the principal server inserts a row in the suspect_pages table with the appropriate error status. The principal server then requests a copy of the page from the mirror server. The request specifies the page ID and the LSN that is currently at the end of the flushed log. The page is marked as restore pending. This makes it inaccessible during the automatic page-repair attempt. Attempts to access this page during the repair attempt will fail with error 829 (restore pending).

  2. After receiving the page request, the mirror server waits until it has redone the log up to the LSN specified in the request. Then, the mirror server tries to access the page in the mirror database. If the page can be accessed, the mirror server sends the copy of the page to the principal server. Otherwise, the mirror server returns an error to the principal server, and the automatic page-repair attempt fails.

  3. The principal server processes the response that contains the fresh copy of the page.

  4. After the automatic page-repair attempt fixes a suspect page, the page is marked in the suspect_pages table as restored (event_type = 4).

  5. If the page I/O error caused any deferred transactions, after you repair the page, the principal server tries to resolve those transactions.

Handling I/O Errors on the Mirror Database

I/O errors on data pages that occur on the mirror database are handled in the following way.

  1. If the mirror server encounters one or more page I/O errors when it redoes a log record, the mirroring session enters the SUSPENDED state. At that point, the mirror server inserts a row in the suspect_pages table with the appropriate error status. The mirror server then requests a copy of the page from the principal server.

  2. The principal server tries to access the page in the principal database. If the page can be accessed, the principal server sends the copy of page to the mirror server.

  3. If the mirror server receives copies of every page it has requested, the mirror server tries to resume the mirroring session. If an automatic page-repair attempt fixes a suspect page, the page is marked in the suspect_pages table as restored (event_type = 5).

    If a mirror server does not receive a page that it requested from the principal server, the automatic page-repair attempt fails and the mirroring session remains suspended. If the mirroring session is resumed manually, the corrupted pages will be hit again during the synchronization phase.

Developer Best Practice

An automatic page repair is an asynchronous process that runs in the background. Therefore, even for a mirrored database, a database operation that requests an unreadable page fails, and it returns the error code for whatever condition caused the failure. When developing an application for a mirrored database, you should intercept exceptions for failed operations. If the SQL Server error code is 823, 824, or 829, you should retry the operation later.