Data Quality Services Concepts
Updated: January 2, 2012
Applies To: SQL Server 2016
This topic provides a brief summary of Data Quality Services (DQS) concepts in knowledge management, data quality projects, and data quality administration.
The DQS knowledge base is a repository of metadata that is created by the data steward or IT pro for use in improving data quality through data cleansing and data matching. DQS knowledge management includes the processes used to create and manage the knowledge base, both in a computer-assisted manner and interactively.
Knowledge discovery is a computer-assisted process that analyzes samples of your organization’s data to build knowledge about the data. Once you have the results of the analysis, you can validate and enhance the knowledge, and then apply it to perform data cleansing, matching, and profiling. For more information, see DQS Knowledge Bases and Domains.
The domain management process enables you to change or augment the knowledge that has been generated by the knowledge discovery process. You can interactively edit, update, and review the knowledge in a knowledge base. A knowledge base consists of data domains that contain domain values and their status, domain rules, term-based relations, and reference data. In domain management, you can change domain properties, attach reference data to a domain, manage domain rules, manage domain values and enter data relations, and create, delete, import, or export domains. You can also use composite domains that aggregate more than one single domain. For more information, see DQS Knowledge Bases and Domains.
A matching policy contains the matching rules used to perform data deduplication. The matching policy process enables you to create matching rules, fine-tune them based upon matching results and profiling data, and to add the policy to the knowledge base. For more information, see Data Matching.
Reference Data Services
You can use reference data to validate, correct, and enrich your data, leveraging the services of companies who guarantee the quality of their reference data. You can use the services of Windows Azure Marketplace to connect to reference data providers, or you can use a direct connection to a provider. For more information, see Reference Data Services in DQS.
For more information about knowledge management in DQS, see DQS Knowledge Bases and Domains.
The data steward performs data-quality operations (cleansing and matching) using a data quality project in the Data Quality Client application.
Data cleansing in DQS is done based on the knowledge in a DQS knowledge base. Data cleansing in DQS is a two-step process:
Computer-assisted cleansing: DQS uses the knowledge in the selected knowledge base for the cleansing project to propose corrections/suggestions to the values in a data source.
Interactive Cleansing: The data steward can perform the interactive cleansing process to change or augment data corrections that have been proposed by the computer-assisted data cleansing process. The data steward does so by using confidence levels and statistics identified by the data cleansing process, or by manually entering their own changes in the project.
After cleansing data, the data steward can export the processed data to a SQL Server database, .csv, or an Excel file. For more information, see Data Cleansing.
The matching process enables the data steward to compare data so that similar, but slightly different, data can be aligned through a deduplication process. DQS performs deduplication based on matching rules contained in the knowledge base; the data steward specifies parameters for the matching process from within a data quality project. For more information, see Data Matching.
Profiling and Notifications
Data profiling provides data stewards real-time statistics and information about the data that is being processed by DQS for the cleansing or matching activities while running a data quality project. Data profiling helps you assess the effectiveness of the cleansing and matching activities in a data quality project, and notifications help the user with actions that can be taken to enhance the data cleansing and data matching activities. For more information, see Data Profiling and Notifications in DQS.
For more information about data quality projects in DQS, see Data Quality Projects (DQS).
A DQS administrator can perform variety of administrative tasks using the Data Quality Client application.
Activity monitoring displays the status and state of each activity performed within a data range, provides data for each activity, and enables DQS administrators to control an activity. For more information, see Monitor DQS Activities.
The Configuration option enables you to:
Configure reference data service settings. For more information, see Configure DQS to Use Reference Data.
Set the threshold values for the cleansing and matching activities. For more information, see Configure Threshold Values for Cleansing and Matching.
Enable/disable profiling notifications. For more information, see Enable or Disable Profiling Notifications in DQS.
Configure severity levels for the DQS log files at the activity-based level or the more advanced module-based level. For more information, see Configure Severity Levels for DQS Log Files.
You use roles within the SQL Server security mechanism to make DQS secure. There are three DQS roles that determine the access level for a user in the Data Quality Client application: dqs_administrator, dqs_kb_editor, and dqs_kb_operator. You cannot grant roles to the users using the Data Quality Client application; it is done using SQL Server Management Studio. For more information, see DQS Security.
For more information about DQS administration, see DQS Administration.