Data Transformation Services for SQL Server 2005-Frequently Asked Design Questions

 

Huntting Buckley, Donald Farmer, Amit Goyal, Craig Guyer, James Howey
Microsoft Corporation

July 2004

Applies to:
   Microsoft SQL Server 2005

Summary: This paper answers some of the questions most frequently asked about Data Transformation Services in SQL Server 2005. In particular it addresses some of the design issues–it is a "why" FAQ rather than a "how" FAQ. (6 printed pages)

Contents

Why Does the Wizard Fail When Transferring Data for Several Thousands of Tables?
Why Can I Not Copy Objects in the Import / Export Wizard?
Why Can I Not Design Packages in the SQL Server Management Studio?
What Are Data Sources and Data Source Views? How Do They Relate to Connections in My Package?
Why Is There No Microsoft Excel or Microsoft Access Data Source?
Why Is the Fast Parse Option Not Used by Default in the Data Flow?
Why Do We Have Separate WMI Data Tasks and Event Tasks?
Why Is the Expression Language Not SQL or Visual Basic?
Why Are There Functions Missing from the Expression Language?
Why Do I Have to Configure Sort Order at the Source Adapter, Rather than at the Consuming Component, for Example Merge?
Why Do I Have to Deal with Having to Lock Variables in My Package?
Why Is There No Notification to the Component When an Output Path Is Detached?
Why Is There a Windows Service for DTS?
Why Am I Not Able to See Running Packages Under My DTS Server in Mgt Studio?
Why Is Package Logging No Longer in the System Database MSDB by Default?
Why Aren't Changes in the Schema of an XML Document Seen in the Source Adapter?
Why Am I Having Problems Migrating Packages That Reference DTSGlobalVariables Parent?
Why Am I Having Problems Migrating Data Pump Tasks?
Why Can I Not Migrate a Dynamic Properties Task?
Why Aren't Global Variables Available in the ActiveX Script Task?
Why Don't We Include the Data Transformation Task and the Data Driven Query Task with DTS?
Why Is DTS No Longer Client-Side Redistributable?
Why Does DTS Not Support an MSI-Based Deployment?

Why Does the Wizard Fail When Transferring Data for Several Thousands of Tables?

Current wizard architecture creates a DTS package data flow to transfer data. There are scalability limits for several thousands of tables. We feel this is reasonable since package designers are not likely to use several thousands of sources and destinations in one data flow. There may be some changes to this in Beta 3.

Why Can I Not Copy Objects in the Import / Export Wizard?

The Import / Export wizard is designed to enable the user to easily move data to and from SQL Server. Copying database objects between servers is really a management operation rather than a data movement operation, even though data is, in fact, moved.

The Copy Database Wizard in the SQL Server Management workbench enables users to copy database objects, and is specialized for doing this.

The DTS Import / Export wizard, on the other hand, simplifies typical DTS operations–loading data from diverse sources including flat files, spreadsheets and OLEDB data sources.

Why Can I Not Design Packages in the SQL Server Management Studio?

The SQL Server Management Studio is an environment for managing the storage and execution of deployed packages. It has special features for doing this, including integration with the DTS Service, and the ability to enumerate packages on remote servers. But it is not a design environment.

The Business Intelligence Design Studio is an environment for designing packages, organizing them in Solutions and Projects, debugging them and managing source and version control for multi-user projects.

So in SQL Server 2005, users design packages in a specialist environment and manage them in a separate specialist environment.

More information on designing and managing packages, and on deploying packages between environments, can be found in Books Online.

What Are Data Sources and Data Source Views? How Do They Relate to Connections in My Package?

Data Sources and Data Source Views are not objects within your DTS package, instead they are separate objects saved in a Data Transformation Project in the Business Intelligence Development Studio. As they are in the project and not in a DTS Package, they can be shared between DTS Packages allowing a Data Source View to be defined for a star schema and used consistently by every package which loads that star schema.

Data Sources define how to connect to an OLEDB or ADO.NET database. They hold the connection string, and some other properties including a friendly name for the connection and a description field.

Data Source Views are subsets of a relational schema that is pointed to by a Data Source. These views can be elaborated with named queries, calculated columns, virtual relationships and friendly names. For example, when working with a large schema you may only want to use a small set of tables from the source; using a Data Source View enables you to define this subset.

Data Sources and Data Source Views are design-time objects which can only be used within BI Development Studio projects. At design-time, a Data Source can be used by one or more DTS Packages in a project, but as at runtime a deployed DTS Package will not have access to these design-time objects. DTS Packages contain Connections, which enable the package designer to manage connectivity to databases, files, and so on. At design-time a Connection can reference a Data Source or a Data Source View, but at runtime the DTS Connection has all the necessary metadata to be used stand-alone. This way the user can design different packages which share Data Source references and Data Source Views, and then run them independently of each other and deploy them separately.

If a DTS Connection references a Data Source, then an object using the DTS Connection can use any Data Source View which also uses the Data Source. In this way, a DTS Package can easily refer to only a small subset of a large schema.

Finally, DTS Connections are more generic than Data Sources–they can connect to text files, HTTP connections, FTP connections and many other scenarios where a connection string is needed–not just to OLEDB and ADO.NET managed providers.

More information on Connections, Data Sources and Data Source Views is available in Books Online.

Why Is There No Microsoft Excel or Microsoft Access Data Source?

Excel and Access data files can be imported using a properly configured OLEDB Connection Manager. You can also easily use the DTS Import / Export Wizard which will build the Connection Manager for you.

For more information search Books Online for "Creating a Package Using the DTS Import/Export Wizard."

Why Is the Fast Parse Option Not Used by Default in the Data Flow?

The Fast Parse option is opportunistic. It doesn't support locale-specific parsing or date formats other than year-month-date. It is only useful for limited data formats, for which faster parsing is possible compared to the default standard parse. Therefore, users have to explicitly choose the option based on their data format.

For more information search Books Online for "Fast Parse."

Why Do We Have Separate WMI Data Tasks and Event Tasks?

System management data can be useful for influencing a DTS control flow. For example, before running an ETL flow, you can check whether SQL Server is running, whether there is a C: drive on the machine or if there is enough disk space available. WMI data task is used to retrieve such system data.

System management events can be useful for taking real-time actions within a DTS control flow. For example, only run the ETL flow when memory usage drops below 50% or only start a data flow after a new source file is available on the file system. The WMI event task allows definition and handling of such system events.

These two tasks are also fundamentally different in the functionality they provide. In this release we have been more aggressive about separating tasks based on the features they provide to support granular building of packages. Another example of this separation is the File System Task and the FTP Task. The features they provided were combined into one task in DTS 2000, but in this release have been separated to make the packages easier to build, understand and support.

Why Is the Expression Language Not SQL or Visual Basic?

Neither SQL nor Visual Basic meets the requirements of the DTS data flow. For example, we wanted to support a range of operators such as bitwise operators manipulating bitmasks that are used in many scenarios as flags. Also, in order to make best use of memory (and thus get the best performance) the DTS data flow is strict about data types–this is also reflected in the expression syntax.

The DTS expression syntax enables you to design high-performance data flows with complex handling of strings and data types. If there are functions you need which are not supported by the expression language, or if you would prefer to use Visual Basic syntax, you can use the Script Component which enables you to work with Visual Basic .NET in the data flow.

Why Are There Functions Missing from the Expression Language?

Resources didn't permit implementation of every function now found in SQL Server TSQL. We surveyed customers and developers to assemble our initial set, and will attend carefully to beta feedback as we prepare for future betas and eventual release.

Why Do I Have to Configure Sort Order at the Source Adapter, Rather than at the Consuming Component, for Example Merge?

The order of rows is set at the source adapter and cannot be changed by most downstream transforms (Sort would be one exception). Source adapters that are cognizant of the order of sourced data can set this information without user assistance, but in most cases insufficient metadata is provided to a source component to allow it to set output column sort information; as such, the user becomes responsible for this effort.

Why Do I Have to Deal with Having to Lock Variables in My Package?

The DTS runtime uses multiple threads for execution when appropriate to improve concurrency and by doing so improve performance and scalability. In order to avoid having more than one thread attempt to access the same variable at the same time, all access to variables must be made "safe" by locking them for your operation; not doing this could result in packages with a range of responses, from ones that "sometimes don't work" to ones that return completely incorrect results. This can be an issue even if the package is completely linear: for example, if the user were to change the concurrency setting in a loop.

Why Is There No Notification to the Component When an Output Path Is Detached?

DTS Designer users should be able to delete a path, then reattach it, without breaking their package or changing its behavior. By providing an OnOutputPathDetached method, we would allow ill-behaved components to break this understanding. To satisfy component authors who wish to provide a variable number of outputs, the DTS object model provides a property on the output called DeleteOutputOnPathDetached that will auto-delete the output when the path is detached if the property is set to true.

Why Is There a Windows Service for DTS?

The new Windows service for DTS will allow an operator to see and stop DTS packages running on a particular server. The service is exposed inside the SQL Management Studio. Note: The DTS service needs to be started first as it is off by default.

See the FAQ topic "Why am I not able to see running packages under my DTS server in Mgt Studio?"

For more information search Books Online for "DTS Service" and "Managing DTS Service."

Why Am I Not Able to See Running Packages Under My DTS Server in Mgt Studio?

The DTS service is disabled by default. This behavior is by design so that the service is not running unless the system administrator wants it running. To start the DTS Service right-click on My computer and choose Manage (from either the Start button or Desktop), then inside the Computer Management window click Service and Applications and then Services to see all the Windows Services listed on the right side. Find DTS Server in the list and right-click to choose Start. (You can also double-click to open its properties and change the Startup Type to Automatic if you would like the service to automatically start each time the machine starts.)

For more information search Books Online for "DTS Service" and "Managing DTS Service."

Why Is Package Logging No Longer in the System Database MSDB by Default?

SQL 2005 DTS adds many new options for logging information about your database. This allows you to define one or more log providers per package that can be of different types. Log Provider Types include Text Files, SQL Server, Windows Event Log, and so on. Within your package you can add the log providers by going to the DTS menu and choosing Logging.

For more information search Books Online for "DTS Log Providers" and "Setting Logging Options in Packages."

Why Aren't Changes in the Schema of an XML Document Seen in the Source Adapter?

Changes in the schema of an XML file significantly affect the metadata of the data flow. This requires the user to explicitly change the schema and take account of the metadata changes.

Why Am I Having Problems Migrating Packages That Reference DTSGlobalVariables Parent?

The scope of changes incorporated in DTS 2005 required a complete object model rewrite. As a consequence after migration to SQL Server 2005 DTS, ActiveX script references to SQL Server 2000 DTS objects accessed via the DTSGlobalVariables Parent property will no longer be valid.

Why Am I Having Problems Migrating Data Pump Tasks?

The DTS 2005 Data Flow Task replaces the DTS 2000 Data Transformation Tasks, offering greatly enhanced functionality. As a result though, there is no unambiguous mapping between the 2000 and 2005 offerings. To ensure continued correct operation of legacy Data Transformation Tasks, they are not completely migrated to SQL Server 2005. The migration wizard currently wraps such a task in a dedicated SQL Server 2000 package, and invokes it from an Execute SQL Server 2000 DTS Package Task.

Why Can I Not Migrate a Dynamic Properties Task?

As this task is directly dependent on the no-longer-present DTS 2000 object model, the migration wizard currently offers only limited support for this task. Instances of this task are replaced with script tasks containing comments describing the properties once set by this task. Manual modification is required to restore prior functionality.

Why Aren't Global Variables Available in the ActiveX Script Task?

Access to global variables is not supported in Beta 2.

Why Don't We Include the Data Transformation Task and the Data Driven Query Task with DTS?

SQL Server 2005 includes the Data Flow task in place of these two SQL 2000 tasks. The Data Flow task places no limits on the number of sources, destinations and transforms, unlike in SQL 2000 where the user was limited to a singe source, destination, and transform per task. The new object model required to support this expanded functionality could not accommodate these two less robust tasks.

Why Is DTS No Longer Client-Side Redistributable?

DTS has been rewritten from a useful utility in SQL Server 2000 to a feature rich application and platform in SQL Server 2005. This change comes with a responsibility of providing service and support to DTS users best handled by managing and licensing the DTS engine functionality similar to other SQL server applications. Therefore, while users can easily deploy packages with the new Deployment wizard, they will be required to run the SQL Server 2005 setup and install DTS on each machine where they want to run the packages. This will not only help ensure the user/machine is using a correct version of the DTS executables but will also assist in updating the software in the future.

Why Does DTS Not Support an MSI-Based Deployment?

MSI-based deployment has the advantage of a complete stand-alone install, but it would require the re-distribution of DTS binaries. If an important security patch is required, it would be impossible to get the fix out to every deployment target in a managed fashion. The current deployment utility requires target servers to have an existing installation of DTS as this is a much more serviceable model for administrators.

For more information search Books Online for "Deployment Utility."