Data Transformation Services (DTS) in Microsoft SQL Server 2000
Diane Larsen, author
Euan Garden, contributor
Note: DTS has been replaced by SQL Server Integration Services (SSIS) in SQL Server 2005. To learn about SSIS, read An Introduction to SQL Server 2005 Integration Services on Microsoft TechNet.
Summary: Database administrators often import, export, and transform data in support of tasks such as data consolidation, archiving, and analysis; for application development purposes; and for database or server upgrades. Data Transformation Services (DTS) in SQL Server 2000 provides a set of graphical tools and programmable objects to help administrators and developers solve data movement problems, including the extraction, transformation, and consolidation of data from disparate sources to single or multiple destinations. Sets of tasks, workflow operations, and constraints can be collected as DTS packages that can be scheduled to run periodically or when certain events occur. This white paper introduces DTS, shows some of the components and services that can be used to create DTS solutions, illustrates the use of DTS Designer to implement DTS solutions, and introduces DTS application development.
Introduction to DTS
What Is DTS?
What's New in DTS?
Using DTS Designer
Tasks: Defining Steps in a Package
Workflows: Setting Task Precedence
Connections: Accessing and Moving Data
The Data Pump: Transforming Data
Options for Saving DTS Packages
DTS as an Application Development Platform
For More Information
Most organizations have multiple formats and locations in which data is stored. To support decision-making, improve system performance, or upgrade existing systems, data often must be moved from one data storage location to another.
Microsoft® SQL Server™ 2000 Data Transformation Services (DTS) provides a set of tools that lets you extract, transform, and consolidate data from disparate sources into single or multiple destinations. By using DTS tools, you can create custom data movement solutions tailored to the specialized needs of your organization, as shown in the following scenarios:
- You have deployed a database application on an older version of SQL Server or another platform, such as Microsoft Access. A new version of your application requires SQL Server 2000, and requires you to change your database schema and convert some data types.
To copy and transform your data, you can build a DTS solution that copies database objects from the original data source into a SQL Server 2000 database, while at the same time remapping columns and changing data types. You can run this solution using DTS tools, or you can embed the solution within your application.
- You must consolidate several key Microsoft Excel spreadsheets into a SQL Server database. Several departments create the spreadsheets at the end of the month, but there is no set schedule for completion of all the spreadsheets.
To consolidate the spreadsheet data, you can build a DTS solution that runs when a message is sent to a message queue. The message triggers DTS to extract data from the spreadsheet, perform any defined transformations, and load the data into a SQL Server database.
- Your data warehouse contains historical data about your business operations, and you use Microsoft SQL Server 2000 Analysis Services to summarize the data. Your data warehouse needs to be updated nightly from your Online Transaction Processing (OLTP) database. Your OLTP system is in-use 24-hours a day, and performance is critical.
You can build a DTS solution that uses the file transfer protocol (FTP) to move data files onto a local drive, loads the data into a fact table, and aggregates the data using Analysis Services. You can schedule the DTS solution to run every night, and you can use the new DTS logging options to track how long this process takes, allowing you to analyze performance over time.
DTS is a set of tools you can use to import, export, and transform heterogeneous data between one or more data sources, such as Microsoft SQL Server, Microsoft Excel, or Microsoft Access. Connectivity is provided through OLE DB, an open-standard for data access. ODBC (Open Database Connectivity) data sources are supported through the OLE DB Provider for ODBC.
You create a DTS solution as one or more packages. Each package may contain an organized set of tasks that define work to be performed, transformations on data and objects, workflow constraints that define task execution, and connections to data sources and destinations. DTS packages also provide services, such as logging package execution details, controlling transactions, and handling global variables.
These tools are available for creating and executing DTS packages:
- The Import/Export Wizard is for building relatively simple DTS packages, and supports data migration and simple transformations.
- The DTS Designer graphically implements the DTS object model, allowing you to create DTS packages with a wide range of functionality.
- DTSRun is a command-prompt utility used to execute existing DTS packages.
- DTSRunUI is a graphical interface to DTSRun, which also allows the passing of global variables and the generation of command lines.
- SQLAgent is not a DTS application; however, it is used by DTS to schedule package execution.
Using the DTS object model, you also can create and run packages programmatically, build custom tasks, and build custom transformations.
Microsoft SQL Server 2000 introduces several DTS enhancements and new features:
- New DTS tasks include the FTP task, the Execute Package task, the Dynamic Properties task, and the Message Queue task.
- Enhanced logging saves information for each package execution, allowing you to maintain a complete execution history and view information for each process within a task. You can generate exception files, which contain rows of data that could not be processed due to errors.
- You can save DTS packages as Microsoft Visual Basic® files.
- A new multiphase data pump allows advanced users to customize the operation of data transformations at various stages. Also, you can use global variables as input parameters for queries.
- You can use parameterized source queries in DTS transformation tasks and the Execute SQL task.
- You can use the Execute Package task to dynamically assign the values of global variables from a parent package to a child package.
DTS Designer graphically implements the DTS object model, allowing you to graphically create DTS packages. You can use DTS Designer to:
- Create a simple package containing one or more steps.
- Create a package that includes complex workflows that include multiple steps using conditional logic, event-driven code, or multiple connections to data sources.
- Edit an existing package.
The DTS Designer interface consists of a work area for building packages, toolbars containing package elements that you can drag onto the design sheet, and menus containing workflows and package management commands.
Figure 1. DTS Designer interface
By dragging connections and tasks onto the design sheet, and specifying the order of execution with workflows, you can easily build powerful DTS packages using DTS Designer. The following sections define tasks, workflows, connections, and transformations, and illustrate the ease of using DTS Designer to implement a DTS solution.
A DTS package usually includes one or more tasks. Each task defines a work item that may be performed during package execution. You can use tasks to:
- Transform data
Transform Data task Use to move data between a source and destination and to optionally apply column-level transformations to the data. Data Driven Query task Use to perform flexible, Transact-SQL–based operations on data, including stored procedures and INSERT, UPDATE, or DELETE statements. Parallel Data Pump task1 Available programmatically only, the Parallel Data Pump task performs the same functions as the Transform Data and Data Driven Query tasks, but supports chaptered rowsets as defined by OLE DB 2.5 and later.
- Copy and manage data
Bulk Insert task Use to quickly load large amounts of data into a SQL Server table or view. Execute SQL task Use to run SQL statements during package execution. The Execute SQL task also can save data that is the result of a query. Copy SQL Server Objects task Use to copy SQL Server objects from one installation or instance of SQL Server to another. You can copy objects such as data and tables, as well as the definitions of objects such as views and stored procedures. Transfer Database task1 Use to move or copy a SQL Server database from an instance of SQL Server version 7.0 or SQL Server 2000 to an instance of SQL Server 2000. Transfer Error Messages task1 Use to copy user-specified error messages, created by the sp_addmessage system stored procedure, from an instance of SQL Server 7.0 or SQL Server 2000 to an instance of SQL Server 2000. Transfer Logins task1 Use to copy logins from an instance of SQL Server 7.0 or SQL Server 2000 to an instance of SQL Server 2000. Transfer Jobs task1 Use to copy jobs from an instance of SQL Server 7.0 or SQL Server 2000 to an instance of SQL Server 2000. Transfer Master Stored Procedures task1 Use to copy stored procedures from a master database on an instance of SQL Server 7.0 or SQL Server 2000 to the master database on an instance of SQL Server 2000.
- Run tasks as jobs from within a package
ActiveX Script task Use to write code to perform functions that are not available in the other DTS tasks. Dynamic Properties task1 Use to retrieve values from sources outside a DTS package at package run time and assign those values to selected package properties. Execute Package task1 Use to run other DTS packages from within a package. Execute Process task Use to run an executable program or batch file. File Transfer Protocol (FTP) task1 Use to download data files from a remote server or an Internet location. Message Queue task1 Use to send and receive messages from Microsoft Message Queues. Send Mail task Use to send an e-mail message. Analysis Services Processing task2 Use to perform processing of one or more objects defined in SQL Server 2000 Analysis Services. Data Mining task1,2 Use to create a prediction query and an output table from a data mining model object defined in SQL Server 2000 Analysis Services.
1 New in SQL Server 2000.
2 Available only when SQL Server 2000 Analysis Services is installed.
You also can create custom tasks programmatically, and then integrate them into DTS Designer using the Register Custom Task command.
To illustrate the use of tasks, here is a simple DTS Package with two tasks: a Microsoft ActiveX® Script task and a Send Mail task:
Figure 2. DTS Package with two tasks
The ActiveX Script task can host any ActiveX Scripting engine including Microsoft Visual Basic Scripting Edition (VBScript), Microsoft JScript®, or ActiveState ActivePerl, which you can download from http://www.activestate.com. The Send Mail task may send a message indicating that the package has run. Note that there is no order to these tasks yet. When the package executes, the ActiveX Script task and the Send Mail task run concurrently.
When you define a group of tasks, there is usually an order in which the tasks should be performed. When tasks have an order, each task becomes a step of a process. In DTS Designer, you manipulate tasks on the DTS Designer design sheet and use precedence constraints to control the sequence in which the tasks execute.
Precedence constraints sequentially link tasks in a package. The following table shows the types of precedence constraints you can use in DTS.
|If you want Task 2 to wait until Task 1 completes, regardless of the outcome, link Task 1 to Task 2 with an On Completion precedence constraint.|
|If you want Task 2 to wait until Task 1 has successfully completed, link Task 1 to Task 2 with an On Success precedence constraint.|
|If you want Task 2 to begin execution only if Task 1 fails to execute successfully, link Task 1 to Task 2 with an On Failure precedence constraint.|
The following illustration shows the ActiveX Script task and the Send Mail task with an On Completion precedence constraint. When the Active X Script task completes, with either success or failure, the Send Mail task runs.
Figure 3. ActiveX Script task and the Send Mail task with an On Completion precedence constraint
You can configure separate Send Mail tasks, one for an On Success constraint and one for an On Failure constraint. The two Send Mail tasks can send different messages based on the success or failure of the ActiveX script.
Figure 4. Mail tasks
You also can issue multiple precedence constraints on a task. For example, the Send Mail task "Admin Notification" could have both an On Success constraint from Script #1 and an On Failure constraint from Script #2. In these situations, DTS assumes a logical "AND" relationship. Therefore, Script #1 must successfully execute and Script #2 must fail for the Admin Notification message to be sent.
Figure 5. Example of multiple precedence constraints on a task
To successfully execute DTS tasks that copy and transform data, a DTS package must establish valid connections to its source and destination data and to any additional data sources, such as lookup tables.
When creating a package, you configure connections by selecting a connection type from a list of available OLE DB providers and ODBC drivers. The types of connections that are available are:
- Microsoft Data Access Components (MDAC) drivers
Microsoft OLE DB Provider for SQL Server Microsoft Data Link Microsoft ODBC Driver for Oracle
- Microsoft Jet drivers
dBase 5 Microsoft Access HTML File (Source) Microsoft Excel 97-2000 Paradox 5.X
- Other drivers
Text File (Source) Text File (Destination) Other Connection
DTS allows you to use any OLE DB connection. The icons on the Connections toolbar provide easy access to common connections.
The following illustration shows a package with two connections. Data is being copied from an Access database (the source connection) into a SQL Server production database (the destination connection).
Figure 6. Example of a package with two connections
The first step in this package is an Execute SQL task, which checks to see if the destination table already exists. If so, the table is dropped and re-created. On the success of the Execute SQL task, data is copied to the SQL Server database in Step 2. If the copy operation fails, an e-mail is sent in Step 3.
The DTS data pump is a DTS object that drives the import, export, and transformation of data. The data pump is used during the execution of the Transform Data, Data Driven Query, and Parallel Data Pump tasks. These tasks work by creating rowsets on the source and destination connections, then creating an instance of the data pump to move rows between the source and destination. Transformations occur on each row as the row is copied.
In the following illustration, a Transform Data task is used between the Access DB task and the SQL Production DB task in Step 2. The Transform Data task is the gray arrow between the connections.
Figure 7. Example of a Transform Data task
To define the data gathered from the source connection, you can build a query for the transformation tasks. DTS supports parameterized queries, which allow you to define query values when the query is executed.
You can type a query into the task's Properties dialog box, or use the Data Transformation Services Query Designer, a tool for graphically building queries for DTS tasks. In the following illustration, the Query Designer is used to build a query that joins three tables in the pubs database.
In the transformation tasks, you also define any changes to be made to data. The following table describes the built-in transformations that DTS provides.
|Copy Column||Use to copy data directly from source to destination columns, without any transformations applied to the data.|
|ActiveX Script||Use to build custom transformations. Note that since the transformation occurs on a row-by-row basis, an ActiveX script can affect the execution speed of a DTS package.|
|DateTime String||Use to convert a date or time in a source column to a different format in the destination column.|
|Lowercase String||Use to convert a source column to lowercase characters and, if necessary, to the destination data type.|
|Uppercase String||Use to convert a source column to all uppercase characters and, if necessary, to the destination data type.|
|Middle of String||Use to extract a substring from the source column, transform it, and copy the result to the destination column.|
|Trim String||Use to remove leading, trailing, and embedded white space from a string in the source column and copy the result to the destination column.|
|Read File||Use to open the contents of a file, whose name is specified in a source column, and copy the contents into a destination column.|
|Write File||Use to copy the contents of a source column (data column) to a file whose path is specified by a second source column (file name column).|
You can also create your own custom transformations programmatically. The quickest way to build custom transformations is to use the Active Template Library (ATL) custom transformation template, which is included in the SQL Server 2000 DTS sample programs.
Data pump error logging
A new method of logging transformation errors is available in SQL Server 2000. You can define three exception log files for use during package execution: an error text file, a source error rows file, and a destination error rows file.
- General error information is written to the error text file.
- If a transformation fails, then the source row is in error, and that row is written to the source error rows file.
- If an insert fails, then the destination row is in error, and that row is written to the destination error rows file.
The exception log files are defined in the tasks that transform data. Each transformation task has its own log files.
Data pump phases
By default, the data pump has one phase: row transformation. That phase is what you configure when mapping column-level transformations in the Transform Data task, Data Driven Query task, and Parallel Data Pump task, without selecting a phase.
Multiple data pump phases are new in SQL Server 2000. By selecting the multiphase data pump option in SQL Server Enterprise Manager, you can access the data pump at several points during its operation and add functionality.
When copying a row of data from source to a destination, the data pump follows the basic process shown in the following illustration.
After the data pump processes the last row of data, the task is finished and the data pump operation terminates.
Advanced users who want to add functionality to a package so that it supports any data pump phase can do so by:
- Writing an ActiveX script phase function for each data pump phase to be customized. If you use ActiveX script functions to customize data pump phases, no additional code outside of the package is required.
- Creating a COM object in Microsoft Visual C++® to customize selected data pump phases. You develop this program external to the package, and the program is called for each selected phase of the transformation. Unlike the ActiveX script method of accessing data pump phases, which uses a different function and entry point for each selected phase, this method provides a single entry point that is called by multiple data pump phases, while the data pump task executes.
These options are available for saving DTS packages:
- Microsoft SQL Server
Save your DTS package to Microsoft SQL Server if you want to store packages on any instance of SQL Server on your network, keep a convenient inventory of those packages, and add and delete package versions during the package development process.
- SQL Server 2000 Meta Data Services
Save your DTS package to Meta Data Services if you plan to track package version, meta data, and data lineage information.
- Structured storage file
Save your DTS package to a structured storage file if you want to copy, move, and send a package across the network without having to store the package in a Microsoft SQL Server database.
- Microsoft Visual Basic
Save your DTS package that has been created by DTS Designer or the DTS Import/Export Wizard to a Microsoft Visual Basic file if you want to incorporated it into Visual Basic programs or use it as a prototype for DTS application development.
The DTS Designer provides a wide variety of solutions to data movement tasks. DTS extends the number of solutions available by providing programmatic access to the DTS object model. Using Microsoft Visual Basic, Microsoft Visual C++, or any other application development system that supports COM, you can develop a custom DTS solution for your environment using functionality unsupported in the graphical tools.
DTS offers support for the developer in several different ways:
- Building packages
You can develop extremely complex packages and access the full range of functionality in the object model, without the using the DTS Designer or DTS Import/Export Wizard.
- Extending packages
You can add new functionality through the construction of custom tasks and transforms, customized for your business and reusable within DTS.
- Executing packages
Execution of DTS packages does not have to be from any of the tools provided, it is possible to execute DTS packages programmatically and display progress through COM events, allowing the construction of embedded or custom DTS execution environments.
Sample DTS programs are available to help you get started with DTS programming. The samples can be installed with SQL Server 2000.
If you develop a DTS application, you can redistribute the DTS files. For more information, see Redist.txt on the SQL Server 2000 compact disc.
Microsoft SQL Server 2000 Books Online contains more information about DTS, using the DTS applications, and building custom solutions. For additional information, see these resources:
- Microsoft SQL Server Web site.
- Microsoft SQL Server Developer Center.
- SQL Server Magazine.
- Microsoft.public.sqlserver.server and microsoft.public.sqlserver.datawarehouse newsgroups at news://news.microsoft.com.
- Microsoft Official Curriculum courses on SQL Server. For up-to-date course information, see the Microsoft Training and Services Web site.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
©2000 Microsoft Corporation. All rights reserved.
Microsoft, ActiveX, JScript, Visual Basic, and Visual C++ are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.