Data Warehousing and Reporting
Replication is often used in data warehousing and reporting applications to:
Consolidate data so it can be transformed and moved into the data warehousing environment.
Distribute data to read-only databases for reporting.
Distribute data to an online analytical processing (OLAP) database.
Although replication does not replicate Microsoft SQL Server 2008 Analysis Services (SSAS) objects (for example, dimensions or cubes), it is often used to distribute data from online transaction processing (OLTP) databases to staging databases and databases that are used for reporting, decision support and analysis purposes.
The following diagram shows a typical scenario, with data replicated from an online processing server to both a reporting server and a staging server for OLAP and ROLAP analysis.
Adventure Works Cycles is a fictional manufacturing company used to demonstrate database concepts and scenarios. For more information, see AdventureWorks Sample Databases.
Adventure Works Cycles uses data warehousing and reporting in a number of departments, including Manufacturing and Human Resources.
Manufacturing stores historical data on manufacturing defects and a number of other quality and performance metrics. Data is replicated from servers at the manufacturing facility to a staging server at the company headquarters. From there the data is transformed and loaded into OLAP cubes for analysis.
Human Resources currently produces reports using a third-party application. They plan to replace this application with Reporting Services. They would also like to expand their reporting capabilities and add the ability to perform these types of analysis:
Compensation and benefits analysis, including analyzing the impact of international currency exchange rates.
Payroll cost simulations and forecasting.
They will bring a new server online to handle the increased need for reports through the company. Data will be replicated from Human Resources and other departments to this central read-only reporting server.
Data warehousing and reporting applications typically have the following requirements, which an appropriate replication solution must address:
The system must maintain transactional consistency.
The system should have low latency: updates at the online processing server should reach the staging and reporting servers quickly.
The system should have high throughput: it should handle the replication of a large number of transactions.
Replication processing should require minimal overhead on the online processing server.
Data changes flow in one direction, from the online processing server to the staging and reporting servers.
The data required at the staging and reporting servers might be a subset of the data available at the online processing server.
SQL Server uses a publishing industry metaphor to describe the components of the replication system. The components include the Publisher, Subscribers, publications and articles, and subscriptions.
In the diagram above, the online processing server is the Publisher. Some or all of the data at the online processing server is included in two publications (one for staging and one for reporting), with each table of data being an article (articles can also be other database objects, such as stored procedures). The staging server and reporting server are Subscribers to one of the publications, with each server receiving schema and data as a subscription. For more information on the components of the system, see Replication Publishing Model Overview.
SQL Server offers different types of replication for different application requirements: snapshot replication, transactional replication, and merge replication. This scenario is best implemented with transactional replication, which is well suited to handle the requirements outlined in the previous section. For more information on transactional replication, see Transactional Replication Overview and How Transactional Replication Works.
By design, transactional replication addresses the principal requirements for this scenario:
The primary option to consider for this scenario is filtering. Transactional replication allows you to filter columns and rows, so the tables at the staging and reporting servers contain only the data required by your application. For more information, see Filtering Published Data.
To implement this scenario, you must first create a publication and subscriptions, and then initialize each subscription. Click the links below for more information about each step:
After the subscription is initialized and data is flowing between the Publisher and Subscribers, you might need to consult the following topics for information on common administration and monitoring tasks: