Enterprise Information Management using SSIS, MDS, and DQS Together [Tutorial]
Managing information in an enterprise typically involves integrating data from across the enterprise and beyond, cleansing the data, matching the data to remove any duplicates, standardizing the data, enriching the data, making the data conform to legal and compliance requirements, and then storing the data in a centralized location with all the necessary security settings.
SQL Server 2014 provides all the components needed for an effective Enterprise Information Management (EIM) solution in a single product. Key components that help you build an EIM solution are:
SQL Server Integration Services
SQL Server Data Quality Services
SQL Server Master Data Services
SQL Server Integration Services (SSIS) provides a powerful, extensible platform for integrating data from various sources in a comprehensive extract, transform, and load (ETL) solution that supports business workflows, a data warehouse, or master data management. See Integration Services Overview topic for a quick overview and typical uses of SSIS.
SQL Server Data Quality Services (DQS) enables you to cleanse, match, standardize, and enrich data, so you can deliver trusted information for business intelligence, a data warehouse, and transaction processing workloads. See Introducing Data Quality Services topic for the business need for DQS and how DQS answers the need.
SQL Server Master Data Services (MDS) provides a central data hub that ensures that the integrity of information and consistency of data is constant across different applications. See Master Data Services Overview topic for brief descriptions of important features of MDS.
See Enterprise Information Management with SQL Server 2012 and Cleansing and Matching Master Data using EIM Technologies whitepapers for a comprehensive guidance on implementing an EIM solution using these Microsoft EIM technologies together and watch Enterprise Information Management (EIM): Bringing together SSIS, DQS, and MDS video for a cool demonstration of an EIM scenario
In this tutorial, you learn how to use SSIS, MDS, and DQS together to implement a sample Enterprise Information Management (EIM) solution. First, you use DQS to create a knowledgebase that contains knowledge about the data (metadata), cleanse the data in an Excel file by using the knowledge base, and match the data to identify and remove duplicates in the data. Next, you use the MDS Add-in for Excel to upload the cleansed and matched data to MDS. Then, you automate the whole process by using an SSIS solution. The SSIS solution in this tutorial reads the input data from an Excel file, but you can extend it to read from various sources such as Oracle, Teradata, DB2, and Windows Azure SQL Database.
Microsoft SQL Server 2012 with the following components installed.
Integration Services (SSIS)
Master Data Services (MDS)
Data Quality Services (DQS)
SQL Server Data Tools
See SQL Server 2012 Installation Guide for details about installing the product.
Use the Configuration Manager to create and configure a Master Data Services database. After you create the MDS database, create a web application for MDS in a web site (for example: http://localhost/MDS) and associate the MDS database with the MDS web application. Note that, to create an MDS web application, you should have IIS installed on your computer. See Web Application Requirements (Master Data Services) and Database Requirements (Master Data Services) for details about the prerequisites for configuring MDS database and web application.
Install and Configure DQS using Data Quality Server Installer. Click Start, click All Programs, click Microsoft SQL Server 2014, click Data Quality Services, and then click Data Quality Server Installer.
Microsoft Excel 2010 (32-bit is preferred).
Install Master Data Services Add-in for Excel (32-bit or 64-bit based on the version of Excel you have on your computer) from here. To find the version of Excel installed on your computer, run Excel, click File on menu bar and click Help to see the version in the right pane. Note that you need to install Visual Studio 2010 Tools for Office Runtime before installing the Excel Add-in.
(Optional) Create an account with Windows Azure Marketplace. One of the tasks in the tutorial requires you to have an Azure Marketplace (originally named Data Market) account. You can skip this task if you want and proceed with the next task.
Download the Suppliers.xls file from Microsoft Download Center.
DQS does not allow you to export the cleansing or matching results to an Excel file if you are using 64-bit version of Excel. This issue is a known issue. To work around the issue, do the following:
Run DQLInstaller.exe –upgrade. If you installed the default instance of SQL Server, the DQSInstaller.exe file is available at C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Binn. Double-click the DQSInstaller.exe file.
In Master Data Services Configuration Manager, click Select Database, select existing MDS database, and then click Upgrade.
Estimated time to complete (in minutes).
In this lesson, you create a DQS knowledge base named Suppliers.
In this lesson, you create and run a DQS project to cleanse the supplier data in an Excel file by using the Suppliers Knowledge Base you created in the first lesson.
In this lesson, you create a DQS project to perform matching activity to identify and remove duplicates from the cleansed suppler list.
In this lesson, you upload the cleansed and matched supplier data to Master Data Services (MDS) by using the MDS Add-in for Excel.
In this lesson, you create an SSIS solution that cleanses input data by using DQS, matches the cleansed data to remove duplicates, and stores the cleansed and matched data on MDS in an automated manner.
To begin the tutorial, continue to the first lesson: Lesson 1: Creating the Suppliers DQS Knowledge Base.