Import Domains from an Excel File in Knowledge Discovery
This topic describes how to import one or more domains from an Excel file in the Data Quality Services (DQS) knowledge discovery activity. The import process simplifies the knowledge generation process, saving time and effort. It enables people who have data in an Excel file or a text file to create a knowledge base with that data. (See Import Values from an Excel File into a Domain for more information about importing values into a domain of an existing knowledge base.) Exporting to an Excel file is not supported.
To import domains from an Excel file, Excel must be installed on the computer that the Data Quality Client is installed on; you must have created an Excel file with domain values (see How the import works); and you must have created and opened a knowledge base to import the domain into.
Start Data Quality Client. For information about doing so, seeRun the Data Quality Client Application.
In the Data Quality Client home screen, do one of the following:
Create a new knowledge base to import into by clicking New knowledge base, entering a name for the knowledge base, selecting None for Create knowledge base from, selecting the Knowledge Discovery activity, and then clicking Create.
Open an existing knowledge base to import into by clicking Open knowledge base, selecting the knowledge base, selecting Knowledge Discovery, and then clicking Next.
In the Map page, select Excel File for Data Source.
Click Browse on the Excel File line.
In the Select an Excel File dialog box, move to the folder that contains the Excel file that you want to import from, select the Excel file, and then click Open.
From the Worksheet drop-down list, select the worksheet in the Excel file that you want to import from.
Select Use First Row as header if you want the first row to be considered a data header, and if you want the values in the first row to be used as column names. Deselect Use First Row as header if you want the first row to be considered a data value, in which case DQS will use the Excel header names (alphabetical letters) for the column.
Select a column, and then either map an existing domain to the column, or create a new domain by clicking the Create a Domain icon, creating a domain in the Create a domain dialog box, and then mapping the domain to the column. The data type of the domain must match the data type of the column. Repeat for all columns of the spreadsheet.
In the Discover page, click Start to analyze the data in the Excel spreadsheet.
If you leave the page before the data has been uploaded, the file upload process will be terminated.
Verify that the analysis completed successfully, and then click Next.
In the Manage Domain Values page, verify that the correct domains are listed in the Domains list and that values are entered in the domain table.
Click Finish, and then click Publish to publish the knowledge base, or No not to publish.
Verify that the knowledge base was published, and then click OK.
After you import domains from an Excel file, you can add knowledge to the domains or use the domains in a cleansing or matching project, depending on the contents of the domains. For more information, see Perform Knowledge Discovery, Managing a Domain, Managing a Composite Domain, Create a Matching Policy, Data Cleansing, or Data Matching.
In the import operation, DQS interprets an Excel file as follows:
A column represents a domain
A row represents a data record
The first row either represents domain names or is the first data value or record, depending upon the setting of the Use First Row as header checkbox.
The following rules apply to the import operation:
This operation imports domain values into a knowledge base. It does not import domain rules or a matching policy.
The Excel file can have the extension .xlsx, .xls, or .csv. Microsoft Excel must be installed on the Data Quality Client computer to import domain values or a complete domain. Excel versions 2003 and later are supported. If the 64-bit version of Excel is used, only Excel 2003 files will be supported; Excel 2007 or 2010 files will not be supported.
Excel files of type .xlsx are not supported for an Excel 64-bit installation. If you are using 64-bit Excel, save the spreadsheet file as an .xls file.
In .xlsx and .xls files, the data type of the column is determined by the most prevalent data type in the first eight rows. If a cell does not conform to that data type, it will be given a null value.
In .csv files, the data type is determined by the most prevalent data type in the first eight rows.
A value in an Excel spreadsheet that does not conform to a domain rule will be imported as an invalid value.
If the Excel file is not in the right format or is corrupted, the import operation will result in an error.