Applies to:SQL Server
SSIS Integration Runtime in Azure Data Factory
The DQS Cleansing transformation uses Data Quality Services (DQS) to correct data from a connected data source, by applying approved rules that were created for the connected data source or a similar data source. For more information about data correction rules, see DQS Knowledge Bases and Domains. For more information DQS, see Data Quality Services Concepts.
To determine whether the data has to be corrected, the DQS Cleansing transformation processes data from an input column when the following conditions are true:
The column is selected for data correction.
The column data type is supported for data correction.
The column is mapped a domain that has a compatible data type.
The transformation also includes an error output that you configure to handle row-level errors. To configure the error output, use the DQS Cleansing Transformation Editor.
You can include the Fuzzy Grouping Transformation in the data flow to identify rows of data that are likely to be duplicates.
Data Quality Projects and Values
When you process data with the DQS Cleansing transformation, a cleansing project is created on the Data Quality Server. You use the Data Quality Client to manage the project. In addition, you can use the Data Quality Client to import the project values into a DQS knowledge base domain. You can import the values only to a domain (or linked domain) that the DQS Cleansing transformation was configured to use.
Use the DQS Cleansing Transformation Editor dialog box to correct data using Data Quality Services (DQS). For more information, see Data Quality Services Concepts.
Data Quality Knowledge Base
Select an existing DQS knowledge base for the connected data source. For more information about the DQS knowledge base, see DQS Knowledge Bases and Domains.
Encrypt connection
Specify whether to encrypt the connection, in order to encrypt the data transfer between the DQS Server and Integration Services.
Available domains
Lists the available domains for the selected knowledge base. There are two types of domains: single domains, and composite domains that contain two or more single domains.
Configure Error Output
Specify how to handle row-level errors. Errors can occur when the transformation corrects data from the connected data source, due to unexpected data values or validation constraints.
The following are the valid values:
Fail Component, which indicates that the transformation fails and the input data is not inserted into the Data Quality Services database. This is the default value.
Redirect Row, which indicates that the input data is not inserted into the Data Quality Services database and is redirected to the error output.
Available Input Columns
Lists the columns from the connected data source. Select one or more columns that contain data that you want to correct.
Input Column
Lists an input column that you selected in the Available Input Columns area.
Domain
Select a domain to map to the input column.
Source Alias
Lists the source column that contains the original column value.
Click in the field to modify the column name.
Output Alias
Lists the column that is outputted by the DQS Cleansing Transformation. The column contains the original column value or the corrected value.
Click in the field to modify the column name.
Status Alias
Lists the column that contains status information for the corrected data. Click in the field to modify the column name.
Set options on the Advanced tab
Standardize output
Indicate whether to output the data in the standardized format based on the output format defined for domains. For more information about standardized format, see Data Cleansing.
Confidence
Indicate whether to include the confidence level for corrected data. The confidence level indicates the extend of certainty of DQS for the correction or suggestion. For more information about confidence levels, see Data Cleansing.
Reason
Indicate whether to include the reason for the data correction.
Appended Data
Indicate whether to output additional data that is received from an existing reference data provider. For more information, see Reference Data Services in DQS.
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.