Export (0) Print
Expand All
Creating Audit Tables, Invoking COM Objects, and More
Exception-handling Techniques
Exploring SQL Server Triggers: Part 2
Deliver User-Friendly Reports from Your Application with SQL Server Reporting Services
SQL Server: Display Your Data Your Way with Custom Renderers for Reporting Services
Updating Data in Linked Servers, Information Schema Views, and More
XML Features in SQL Server 2000
Expand Minimize

DTS Programming Techniques Used in Microsoft SQL Server Accelerator for Business Intelligence

SQL Server 2000
 

Carl Rabeler and Dave Wickert

February 2003

Applies to:
    Microsoft® SQL Server™ Accelerator for Business Intelligence

Summary: Learn about the Data Transformation Services (DTS) programming techniques used in SQL Server Accelerator for Business Intelligence (BI Accelerator). Using these techniques, the Microsoft BI Practices team created an extensible data movement system for analytical applications that developers can extend and database administrators (DBAs) can manage. (20 printed pages)

Contents

Introduction
Data Movement System Design Goals
Data Movement System Design Challenges
DTS Tips and Tricks
DTS Programming Guidelines
Conclusion
Finding More Information

Introduction

Microsoft® SQL Server™ Accelerator for Business Intelligence (BI Accelerator) is designed to help developers rapidly create analytical application prototypes. With BI Accelerator, developers can provide schema information and then generate iterative prototypes of an analytical application programmatically. Each prototype includes a Staging database, a Subject Matter database, an Analysis database containing multidimensional cubes, application files and a data movement system. The data movement system moves data from external sources into the Staging database, from the Staging database to the Subject Matter database, and from the Subject Matter database to the Analysis database. As data is moved through these structures, Data Transformation Services (DTS) tasks perform data transformations, data lookups, error checking and cube processing.

This paper uses the BI Accelerator data movement system to demonstrate the power and flexibility of DTS. The topics in this paper address the design goals that dictated the use of DTS, the particular problems that had to be solved to meet the BI Accelerator design goals, tips and tricks that the Microsoft BI Practices team used in designing the DTS packages, and the DTS programming guidelines that the BI Practices team followed in designing the data movement system. Understanding how the BI Accelerator data movement system was designed will improve your ability to design and build your own flexible DTS data movement system.

This paper is intended for developers who are familiar with SQL Server, DTS and SQL Server 2000 Analysis Services. An understanding of BI Accelerator is not required to understand the DTS programming concepts presented in this paper. However, readers who want more information about how the programming techniques discussed in this paper are implemented can download BI Accelerator.

Data Movement System Design Goals

Three major design goals dictated the use of DTS for the BI Accelerator data movement system:

  • All components of the analytical application must be generated programmatically, including the data movement system. This allows developers to quickly generate and test multiple iterations of the prototype.

    The packages in the BI Accelerator data movement system can be dynamically generated from code using schema information that developers provide.

  • Developers must be able to understand and extend the data movement system easily. Because each environment in which the analytical application runs is unique, the requirements for each environment cannot be completely anticipated. After completing the final iteration of the analytical application, developers must be able to configure, extend, and customize the data movement system before it is deployed.

    Developers can use DTS Designer to view and understand the BI Accelerator data movement system. Developers can then extend and customize the data movement system by modifying the packages that BI Accelerator generates or by adding new packages. Generated packages can generally be modified directly in DTS Designer. New packages can be generated directly in DTS Designer or in Microsoft Visual Basic®.

  • Database administrators (DBAs) must be able to understand, configure, and manage the data movement system easily using standard SQL Server tools. While the BI application is generated programmatically and can be extended by developers, DBAs must be able to manage the day-to-day operations of the analytical application using tools with which they are familiar.

    DBA can also use DTS Designer to view and understand the BI Accelerator data movement system. DBAs can configure and manage the data movement system using global variable configurations, batch files, a Config.ini file, error tables and log files, all of which can be managed with standard tools.

Data Movement System Design Challenges

To create a data movement system that met the design goals described in the previous section, the BI Practices team had to overcome several limitations that are inherent in DTS:

  • Connection information and application meta data is stored in each package. Server names and other connection information to source and destination data stores, by default, are embedded in the connection objects in each package. This means that each package must be edited individually each time this information changes.

    The analytical application meta data used in the data movement system poses a similar problem. By default, the meta data is either stored directly in DTS tasks or in global variables that are used by DTS tasks, which again, means that each package must be edited individually.

  • Global variable values are embedded in each package. Global variable values, by default, are embedded in each package, which generally means that each package must be edited individually.
  • Conditional logic is not directly supported. The DTS Data Driven Query task performs flexible Transact-SQL operations on rows of data flowing between tables. The parser in DTS Designer, however, does not support the use of conditional logic and parameterized, multi-table joins to generate the source rowset.
  • Information does not typically flow from child packages to parent packages. DTS is designed to pass information from a parent package to one or more child packages. But DTS is not designed to pass information from a child package back to the parent package or to another child package.
  • Branching is not directly supported. Branching enables a single package to include multiple paths that can be traversed based on the value of a global variable. Without branching, a separate package is required for each branch.
  • Looping is not directly supported. Looping enables a package to loop through a number of package steps using conditional logic. Without looping, a separate task is required for each step.
  • Task complexity. Microsoft ActiveX® tasks that contain too many properties increase the complexity of packages, making them difficult for developers to understand and access.
  • Package storage. The single, flat SQL Server namespace does not permit the storage of multiple packages with the same names inside a single instance of SQL Server. In addition, migrating packages stored in SQL Server from the development environment to the QA or production environment is difficult.

The following sections discuss these design problems and their solutions in detail.

Connection Information and Application Meta Data

DTS packages must establish valid connections to source and destination data stores. To accomplish this, connection objects within each package store connection information for the data stores the package uses. Tasks within the package use these connection objects when connecting to the data stores. Because the connection information for these data stores is embedded within connection objects, each connection object must be updated with new connection information when the data store changes.

The use of application meta data within the data movement system poses a similar problem. When this meta data, such as the path to the application files, changes, each task relying on the original file path must be updated.

For example, when a DTS data movement system is moved from the development environment to a QA or production environment, the source and destination data stores for all packages in the system must be updated. Application meta data may also require updating when the data movement system is moved. When a system has as many packages the BI Accelerator data movement system, updating each package individually is a daunting and potentially error-prone task.

Solution

Fortunately, DTS provides several methods that enable connection information for a package to be retrieved at run time from a source outside of the package. These methods use the Data Link connection object or the Dynamic Properties task.

  • By using a Data Link connection object in a package, you can configure the package to retrieve connection properties from a Microsoft Data Link (*.udl) file at run time. If you place connection information used by multiple packages in a single file stored in the file system, you can change the connection information for multiple packages at the same time, without editing each package individually. The next time that a package runs that references the data link file, the package uses the updated connection information. A data link file, however, can only store connection information for a single data store.
  • By using a Dynamic Properties task, you can enable one or more values to be retrieved at run time from an external data store and then assigned to selected object properties within a package (such as connection object properties). An initialization (*.ini) file is one of the data stores from which values can be retrieved. An *.ini file is a text file in which you can place connection information and other information that package objects require. An *.ini file can store multiple types of information, including connection information for multiple data stores and initialization values for multiple global variables.

While the BI Accelerator data movement system could have used either of these mechanisms to set the connection properties for source and destination connection objects dynamically, the BI Practices team chose the Dynamic Properties task because it solves the connection information problem and the application meta data problem simultaneously.

Implementation

The BI Accelerator data movement system retrieves connection information and application meta data from a single Config.ini file stored in the file system. This file is created when BI Accelerate generates an iteration of the analytical application and is populated with information that the developer provides. Each iteration uses a separate Config.ini file. The absolute file path to all BI Accelerator-generated applications on a server is written to the registry, while the relative file path to each analytical application is encoded in each package. Multiple versions of the analytical application can coexist on a server because all of the files that are associated with each application, including the Config.ini file, are located in their own folder structure.

The initial task in each package reads the registry and constructs the absolute path to the Config.ini for the analytical application. This location is then recorded in a global variable. A Dynamic Properties task then reads the connection information in the Config.ini file and updates the appropriate global variables in the package. Application meta data is retrieved and the appropriate global variables are set when a task requires this information.

Global Variable Values

Packages can use global variables to hold values. These values are assigned to package properties and are used in package tasks to modify functionality. For example, global variables can be used to determine whether a given package executes; the batch size of a data import process; the maximum number of errors that are permitted; and the algorithm that is used to determine if data contains inserts, updates, or both. Global variables and their values can either be passed by a parent package to its child packages, or a child package can use global variable values that are different from those of its parent package.

The default value for each global variable is embedded directly into each package at design time. However, to manage the day-to-day operation of the data movement system, DBAs must be able to modify the values of the appropriate global variables in every package dynamically at run time. For example, a DBA may want to use one global variable configuration to load data from HR and another global variable configuration to load data from Sales.

Solution

The BI Accelerator data movement system uses two SQL Server tables to store non-default values for selected global variables:

  • A global variable configuration table stores configuration IDs, including a value for a default configuration. Values that the DBA chooses to override can be associated with the default configuration or with a defined configuration.
  • A global variable override table contains a row for each override value. Each row contains the override value, the name of the package to which the override applies, and a configuration ID for the override value. If the default configuration ID is specified for an override value, the override value for the specified package is used each time the default configuration of the package is executed. If a non-default configuration ID is specified, the override value for the specified package is used whenever the package is executed with that configuration ID. This enables DBAs to create configurations for specific purposes. For example, a DBA may want certain packages to be bypassed when HR data is loaded and other packages to be bypassed when sales data is loaded. In addition, the DBA may want the HR data to be loaded with different execution parameters, such as batch size or error threshold, than the sales data.
    Note   Because child packages in the data movement system are designed to inherit selected global variable values from parent packages, changes to the default values in a parent package flow to its child packages. Therefore, changes to default values need only be made in parent packages, unless a DBA wants to use different values in a child package.

Implementation

After the BI Accelerator data movement system updates the connection information from the Config.ini file, an Execute SQL task runs, retrieving override values for the execution of the package. The Execute SQL task performs a join between the global variable configuration table and the global variable override table and returns a result set containing the override values for the configuration ID associated with the execution of the package. The configuration ID is either the default configuration ID (-1) or a configuration ID that is passed to the package at run time through the DTSRun.exe command line utility. The Execute SQL task uses a rowset output parameter to pass the rowset to an ActiveX task. This ActiveX task loops through the override values in the rowset and updates the appropriate global variable values in the package.

Because override values are retrieved for each package at run time, DBAs can control the operation of each package individually. However, because the BI Accelerator data movement system passes many of the global variables from parent packages to child packages, DBAs can also choose to override a value at the parent package level and allow the change to flow through to all of its child packages.

Conditional Logic

The Data Driven Query task performs parameterized Transact-SQL operations on rows of data as they are imported into a SQL Server table from a source rowset. The rows of data are held in an in-memory structure defined by a binding table, which must be a physical table that maps to the source rowset. For each row in a source rowset, the Data Driven Query task selects, customizes, and executes one of several Transact-SQL statements. The Transact-SQL statement that is executed on each row is based on an evaluation of the row by an ActiveX transformation script within the Data Driven Query task. Evaluating each row in a table is resource intensive and can seriously degrade package performance. The parser in the Data Driven Query task does not support the use of conditional logic and parameterized, multi-table joins to generate the source rowset, which would enable SQL Server to use set-oriented processing and improve the performance of the Data Driven Query task. Instead, the parser expects the source rowset to be generated from an existing SQL Server table using simple logic.

Solution

The BI Accelerator data movement system solves this problem by using a stored procedure to generate the source rowset for each Data Driven Query task. These stored procedures are parameterized, contain conditional logic, and perform surrogate key lookups using multi-table joins. The use of stored procedures to generate the source rowset in the Data Driven Query tasks provides a number of benefits:

  • The parser in DTS Designer can successfully parse the stored procedure.
  • Conditional logic and surrogate key lookups using multi-table joins are performed based on parameters passed to the stored procedure from the Data Driven Query task.
  • Set-oriented processing enables the Data Driven Query task to be used as a fast in-memory forward cursor to step through the generated rowset and perform selected Transact-SQL operations based on the ActiveX evaluation of each row in the rowset.
  • Each stored procedure is used in multiple Data Driven Query tasks, providing a single place to modify or extend the logic used to generate the rowset for each Data Driven Query task.

Because the rowset generated by the stored procedure does not map to an existing table, BI Accelerator generates a physical table containing the schema for the rowset that the stored procedure generates. This physical table is defined as the binding table for the Data Driven Query task; it does not contain data.

In the BI Accelerator Data Driven Query task, an ActiveX transformation script determines the Transact-SQL operation that is performed on each row, such as an insert or an update. However, the actual operation is performed by an insert or update stored procedure, using the parameterized values passed in by the in the BI Accelerator Data Driven Query task. When the Data Driven Query task is used in this manner, it is very fast. The logic can be easily extended through the stored procedure without requiring any package to be edited.

Implementation

The BI Accelerator data movement system uses Data Driven Query tasks in the DTS packages that update dimension level tables. A single stored procedure is created for each dimension level table. This stored procedure generates the result set used to insert, update, or delete members from its associated table. Conditional logic determines the portion of the stored procedure that runs each time the stored procedure is called. Inserts and deletes do not require a Data Driven Query task. When the stored procedure is called and the update phase of the stored procedure runs, the rowset is returned to the calling Data Driven Query task and held in memory. The ActiveX transformation script in the Data Driven Query task evaluates each row in the result set to determine if the dimension being updated is a Type I or Type II changing dimension. A stored procedure that updates dimensions is called for each row and either updates the dimension member in place or creates a new dimension member and marks the old dimension member as inactive. In either case, no transformation actually occurs in the Data Driven Query task. The transformation initially occurs when the rowset for the Data Driven Query task is generated; the transformation associated with the update occurs through a stored procedure after the Data Driven Query task determines the type of changing dimension that is being updated.

Information Flow Between Child Packages and Parent Packages

A parent (or master) package can call and execute a child package through the Execute Package task. This capability allows complex package workflow to be divided into two or more child packages that are called by a parent package. Using multiple packages increases package readability, enables discrete units of work to be encapsulated into individual packages, enables packages to be reused, and simplifies package debugging.

The Execute Package task communicates with a child package using inner and outer package global variables:

  • Inner package global variables are used to set the values specified in the calling Execute Package task to global variables in the child package. The specified values override the default values for these global variables in the child package.
  • Outer package global variables are used to pass the parent package global variables specified in the calling Execute Package task to the child package being executed. If these global variables do not exist in the child package, they are created temporarily. If these global variables exist, the parent package global variables and their values are used instead of identically named global variables in the child package.

DTS provides no mechanism, however, that allows a child package to communicate with the parent package, or to communicate with another child package that is also executed by the parent package.

Solution

The BI Accelerator data movement system solves this problem by configuring each package with two global variables that store pointers: one global variable stores a pointer to the package’s own global variable collection, while the other global variable stores a pointer to the global variable collection of its parent (or calling) package. Through these pointers, child packages can communicate with parent packages and other child packages.

Implementation

When a child package is called by a parent package, the parent package defines the global variable holding the pointer to its own global variable collection as an outer package global variable, and passes that global variable to the child package in an Execute Package task. The child package uses an ActiveX task to access the global variable collection of the parent package directly, by going through the pointer. The ActiveX task in the child package can then directly modify (in memory) any of the global variable values in the global variable collection of the parent package, and read any global variable changes made by another child package.

The BI Accelerator data movement system uses this solution in its dimension level update packages to signal to the parent dimension package the type of dimension update that must be performed, based on the type of update that occurs at each dimension level and the type of changing dimension involved.

Note   This solution cannot be used with the Execute Process task because the Execute Process task spawns its own process. A child package can update only the values in the global variable collection of the parent package if it is running in the same process as the parent package.

Branching

DTS provides precedence constraints to control workflow between tasks in a package based on the completion, success, or failure of a previous task. However, DTS does not provide a mechanism to control workflow within a package based on the value of a global variable. For example, it is quite useful to be able to have your package workflow proceed down either branch A or branch B of the package based on run-time conditions communicated to the package through a global variable value.

Solution

The BI Accelerator data movement system uses an ActiveX task to solve this problem. Based on the value of a global variable, the ActiveX task uses conditional logic at run time to direct the workflow out of the ActiveX task to one of several branches in the package.

Note   Because the name of each task that the ActiveX task can activate is embedded in the ActiveX task itself, to extend the package, you may have to edit the ActiveX task itself instead of inserting a new task at the appropriate point in the package. In addition, this solution makes the workflow in the package more difficult to understand unless you view the actual ActiveX script in the task that performs the branching, because the ActiveX task appears to have multiple success paths when you view the package through DTS Designer.

Implementation

Within each package that requires the capability to branch, an ActiveX task is created at the branch point. The ActiveX script in the task has multiple branches leading out of the task to different branches of the package. These branches are disabled by default. When the ActiveX task runs, the conditional logic in the script determines the branch of the package that is executed.

The BI Accelerator data movement system uses this solution for a variety of purposes:

  • Each package contains an ActiveX branch task, which enables a global variable to control whether the package executes or is bypassed.
  • Each dimension level package contains both an insert and update branch and a delete branch. The ActiveX branch task uses a global variable value passed to it by the parent package to determine which branch to execute. Through the use of this branching solution, a single package performs all dimension insert, update, and delete activity on a dimension level table.
  • Each dimension package contains branches for processing each hierarchy. An ActiveX branch task determines whether the dimension hierarchy is processed or whether the processing steps are bypassed. Again, a single package performs all processing activity on a dimension level table.

Looping

DTS does not provide a mechanism that supports looping through a number of tasks in a package. However, looping through a sequence of package tasks is sometimes the best solution to a problem. For example, when Analysis Services processes a cube that is partitioned by month, every partition in the cube must be processed. The number of cubes to be processed and their names, however, is not known until run time because not all partitions contain updates and new partitions are created every month. While a separate task can be created for each known partition, this is not an elegant solution and does not solve the problem of partitions that are not known at design time. A better solution is to determine the number of partitions that require processing at run time and then execute the cube processing task once for each partition requiring processing. However, this solution requires the package to loop through several tasks until all partitions are processed.

Solution

The BI Accelerator data movement system uses an ActiveX task to solve this problem. The ActiveX task is configured with an ActiveX workflow script that runs before the ActiveX task. This script uses conditional logic to determine whether the ActiveX task runs or whether the workflow is directed back to an earlier task. If the workflow is directed back to an earlier task, each task to be re-executed is reset and then re-executed. This process continues until the looping condition is satisfied.

Note   Because the names of the tasks that are reset by the ActiveX task are embedded in the ActiveX task itself, to extend the package you may have to edit the ActiveX workflow instead of inserting a new step at the appropriate point. In addition, this solution makes it more difficult to understand the workflow in the package without viewing the actual ActiveX workflow script in the ActiveX loop task; when you view the package through DTS Designer, you cannot see the looping mechanism in the task.

Implementation

As mentioned earlier, the BI Accelerator data movement system uses an ActiveX workflow script and loop task to process all partitions that require processing. The package associated with each cube includes the following tasks, which execute in a loop until all required processing has been completed:

  1. An Execute SQL task queries the appropriate fact table in the Staging database to determine the number of distinct months and years contained in the data being added to the cube and stores the result in a rowset global variable.
  2. An ActiveX task retrieves the year and month of the first record in the recordset and stores these values in two separate global variables.
  3. An Execute Package task calls the partition processing package and uses outer package global variables to pass execution values to the package. These variables include the year and month values retrieved from the recordset in step 2, along with a number of other global variables.
  4. After the first partition is processed, an ActiveX loop task runs. The ActiveX workflow script in this task determines whether the last record in the recordset has been processed.
    • If the last record has not been processed, the workflow script resets the return status of steps 2 and 3 to DTSStepExecStat_Waiting instead of executing the current step. The status of the current step is set to DTSStepScriptResult_DontExecuteTask. Step 2 is then re-executed; this time, the next record in the recordset is retrieved and Step 3 causes the next partition to be processed. This continues until all partitions that require processing have been processed.
    • If the workflow script determines that the last record has been processed, the workflow sets the status of the current step to DTSStepScriptResult_ExecuteTask and the task contained in the body of the ActiveX task is executed.

Task Complexity

When you design a single ActiveX task, it is quite easy to retrieve global variable values, use those values to set task properties directly, and then execute the task. While this may be easy from a programming perspective, using multiple steps in a single ActiveX task makes the task more difficult for developers and DBAs to understand and extend. Complex ActiveX tasks make packages more difficult to understand because the full scope of each ActiveX task's functionality is not visible in DTS Designer until each task opened and the script deciphered. The only visible clue to the functionality of an ActiveX task is its name. Complex ActiveX tasks make packages more difficult to extend if you want to add a task between two of the tasks that are contained in the ActiveX task. This type of extension requires you to modify and extend an existing ActiveX script, making the ActiveX task even more complex and difficult to understand. Debugging errors is also more difficult.

Solution

The BI Accelerator data movement system avoids this problem by splitting multi-step ActiveX tasks into three separate tasks. By creating separate tasks, the BI Practices team accomplished the design goal of making the BI Accelerator tasks and packages easy to understand and extend. However, this solution adds additional tasks to packages for the sake of flexibility, increasing package complexity by separating code that logically belongs together.

Implementation

The BI Accelerator data movement system uses the following three tasks to gather and set the properties for processing dimensions and partitions:

  1. An ActiveX task gathers the values to be used and then records them in a set of global variables.
  2. A Dynamic Properties task sets the values of the global variables into the properties of the execution task.
  3. The execution step, such as an Execute SQL task, an Execute Package task, or an Execute Process step, runs.

This three-task approach is much easier to understand and extend than a single ActiveX that encapsulates these discrete steps.

Package Storage

Using SQL Server to store packages used in the data movement system has several limitations. First, the namespace used by SQL Server for storing packages is flat and does not support the concept of a folder hierarchy. This means that each package stored within an instance of SQL Server must have a unique name. In addition, moving packages stored in a SQL Server instance to another SQL Server instance is cumbersome. To accomplish this, each package must be individually opened and then saved to the new SQL Server instance. Given the number of packages in the data movement system, this is a time-consuming task.

Solution

BI Accelerator solves these problems by storing files and packages from each iteration of the analytical application in separate folder structures in the file system. Storing packages in the file system permits identically named package from different iterations of the analytical application to be stored on the same server and enables packages to be moved between servers simply by copying them to the new server. In addition, storing packages in the file system makes it easy for DBAs and developers to:

  • Back up packages at the same time as other application files in the file system.
  • Execute packages using the DTSRun.exe command line utility.
  • Integrate packages with a standard source control system, such as Microsoft Visual SourceSafe®.
    Note   To edit a package stored in the file system, use SQL Server Enterprise Manager to open the package in DTS Designer. You cannot open a package in DTS Designer directly from the file system.

Implementation

When BI Accelerator generates an iteration of the analytical application, all of the files associated with that iteration are placed in their own folder structure. This structure stores client files, debug files, DTS packages, import files, temporary work files, and a variety of other files used by the analytical application. Using separate folder structures allows developers to work with different iterations of the same analytical application or iterations of different analytical applications simultaneously.

DTS Tips and Tricks

The BI Practices team used a number of DTS tips and tricks when designing the data movement system for the analytical application. Using these tips and tricks when you design your DTS packages will help you avoid design problems, improve overall performance, increase package usability, and make packages easier to understand and extend.

Create a Common Startup Sequence

Create a common startup sequence of tasks in every package in your data movement system. This common startup sequence allows developers to focus on what needs to be done, and not how. While every task in the startup sequence may not be required in every package, using the same startup sequence in every package makes each package easier to understand and work with.

Include a Bypass Step in Every Package

Include a bypass step in every package as part of the common startup sequence and define an override global variable in every package to control this bypass step. Use a data store, such as SQL Server tables, to group the override values for specific packages into configurations. This enables you to bypass groups of packages as appropriate. For example, you can use one set of packages to load HR data on Wednesday and a different set of packages to load the sales data on Saturday.

Run Analysis Services Processing Tasks in Separate Processes

When DTS executes multiple Analysis Services Processing tasks in the same process (parent and child packages execute in the same process), DTS creates only one instance of the Analysis Services processing component because DTS assumes that this component is free-threaded. However, the Analysis Services Processing task is a Visual Basic component and is not free-threaded; it is written using the single apartment model. This means that the thread that creates the processing component must be the thread that actually executes the processing component. Other COM threads must marshal their execution onto the creating thread and requests from multiple Analysis Services processing tasks queue up at the COM level.

To avoid queuing and ensure parallel processing of Analysis Services Processing tasks, run each processing task in a separate process. To accomplish this, use an Execute Process task to call the DTSRun.exe command line utility and execute the Analysis Services Processing task in a DTS package. The Execute Process task provides a new Microsoft Win32® process, within which the Analysis Services Processing task executes on its own thread.

Note   When running multiple Analysis Services Processing tasks in parallel, OLAP repository locking issues can be a problem if you are accessing a single partition through separate processes. Decision Support Objects (DSO) uses locking modules to ensure that no two processes access the same partition at the same time.

Call DSO from an ActiveX Script

Decision Support Objects (DSO) is the COM interface used to manage Analysis Services cubes and partitions. Call DSO from an ActiveX script when DSO is needed. Using an ActiveX script is an easy and straightforward process, allowing you to use the flexibility of ActiveX for calling DSO in a package. You do not need to create a separate Visual Basic application to execute a DSO task. The following code shows how to use DSO in an ActiveX script to delete cube partitions:

sAnalysisServer = DTSGlobalVariables ("gsAnalysisServer_RESERVED").Value
sOlap = "SMA 01"
sCube = "Sales"
Set dsoServer = CreateObject("DSO.Server")
dsoServer.Connect sAnalysisServer
Set dsoDB = dsoServer.mdStores(sOlapDB)
SetdsoCube = dsoDB.mdStores(sCube)
'Delete all partitions with names different than the cube.
iPartCounter = dsoCube.mdStores.Count
Dim i
For i = iPartCounter To 1 Step -1
   If dsoCube.mdStores(i).Name <> sCube
   Then dsoCube.mdStores.Remove I
   End If
Next

Calling DSO from within a DTS package enables the automatic processing of partitions immediately after they are loaded with new data. Using global variables, DBAs can control the type of automatic processing that occurs.

Check for Unexpected Deadlocks When Troubleshooting

Packages can easily become deadlocked with other packages. Because all packages have their own connection objects, packages running in parallel use separate connections, and each connection looks like a different user to SQL Server. As a result, if multiple packages attempt to access the same resource through different connection objects, the packages can become deadlocked. Because the times at which these packages execute varies, deadlock errors are intermittent, and therefore, hard to track down. To resolve deadlocks that occur, use standard deadlock resolution techniques, such as specifying a locking hint.

Note   With only one connection to a database, a deadlock cannot occur; instead, a queue is created within the connection.

Add an Additional Connection to a Package to Avoid Queuing

Within a single package, only one task can use a connection at a time. Package steps that require the same connection at the same time serialize and create a queue, even if the package appears to be executing these tasks in parallel in DTS Designer. The use of a single connection by multiple package tasks is not always obvious. For example, it is quite visible with some tasks, such as the Transform Data task, but not as obvious with other tasks, such as the ActiveX task.

A task that uses the lookup query functionality encapsulated in the Transform Data and Data Driven Query tasks is required to use a separate connection for the lookup query because only one query can be active on a connection at one time. All BI Accelerator packages contain three connections: the source data store, the binding table data store (used for lookup queries), and the destination data store.

Note   If you need more than three connections in a single package, your data movement system is probably not modular enough.

Use a Global Variable to Capture the Number of Records Processed in a Transform Data or Data Driven Query Task

There is no straightforward way to determine and output the number of records that are processed by a Transform Data or Data Driven Query task, because the information is contained within the running task itself. The following code shows you how to capture this information in a global variable by adding an additional data pump stage to the task.

Option Explicit
Function PostSourceMain()
   DTSGlobalVariables("giInsertRows”).Value =
      CLng(DTSTransformPhaseInfo.DestinationRowsComplete) 
   PostSourceMain = DTSTransformstat_OK
End Function

After you capture this information in a global variable, it is easy to print or log it.

DTS Programming Guidelines

The BI Practices team used the following guidelines when designing the BI Accelerator data movement system. Following these guidelines when designing your DTS data movement system will improve its design, performance, and manageability.

Use Subpackages

Create a hierarchy of packages, with a master package calling child packages and the child packages calling other child packages as appropriate. This programming practice provides the following advantages:

  • You can create modularized packages to reduce the workflow complexity in individual packages and make your DTS data movement system easier to understand. For example, the BI Accelerator data movement system uses a separate child package for each dimension, dimension level, and cube.
  • You can create packages that run in parallel. For example, the BI Accelerator data movement system can execute multiple dimension level update packages simultaneously.
  • You can create packages that perform standardized tasks, incorporating specific functionality only once instead of duplicating it in multiple packages. This way, you can modify or extend this functionality in a single location and easily maintain functional consistency across multiple packages. For example, the BI Accelerator data movement system uses a single master error package that is called by all packages when an error occurs.
  • You can create packages that are dynamically configured and executed based on the global variable values passed from a parent package at run time. For example, the BI Accelerator data movement system uses a single package for processing dimension hierarchies and a single package for processing partitions. These packages are called as needed by modularized packages for each cube partition and dimension hierarchy. The processing performed by each of these packages is determined by the values passed from a parent package at run time, including the name of the dimension hierarchy or cube partition and the type of processing (full or incremental).
  • You can re-execute a child package more easily than you can re-execute a series of steps in a single package. This is particularly useful for troubleshooting.
  • You can modify, extend, or update an individual package independent of other packages in the DTS data movement system.

Do Not Modify Concurrency Settings

Do not modify the maximum task concurrency setting in any child package. Tune the number of packages that run in parallel by configuring the task concurrency setting in the master package. This enables you to use the task concurrency setting in the master package to control the number of child packages that the parent package executes concurrently.

Do Not Use the Save_As_VB Option

Do not use the Save_As_VB option in DTS Designer to save packages as Visual Basic code. Create packages by hand in Microsoft Visual Studio® and then save them as DTS packages. Use the code generated by DTS Designer as example code only. The code that DTS Designer generates is not necessarily modular, creates too many variables, and does not necessarily initialize all of them. In addition, the generated transformations are not particularly effective. Finally, the Visual Basic generating tool in DTS Designer does not reverse engineer symbolic offsets; you must do this yourself to see the constant and not just the parameter.

Use Integrated Security

Use integrated security in your connection objects when connecting to a data source. This makes moving your DTS data movement system from one server to another easy. Integrated security also makes the DTS data movement system more secure because credentials are not saved in connection objects.

Note   Saving credentials in connection objects can compromise security and should be avoided when possible.

With integrated security, the domain to which the user executing the package belongs authenticates the user's credentials at run time, enabling passwords to be changed without requiring changes in packages. This makes it easy to keep the DTS data movement system operating environment secure. Remember that the execution credentials come from the executor of the master package, and not the package itself. The executor is either the interactive user or the SQL Server Agent service account.

Consider the Execution Environment

Consider the environment and the security context in which the packages will run. Developers frequently create and test DTS packages interactively on a development computer, only to have them fail when they are used in SQL Server Agent jobs or executed from a client workstation.

Execution location

Packages run either on a client computer or a server, based on how they are called:

  • Packages that are executed by a SQL Server Agent job always run on the server in the security context of the SQL Server Agent service account. If SQL Server Agent is using the local system account, the package has no permissions outside of the server that is running SQL Server.
  • Packages whose execution is initiated on a client computer run in the security context of the user account of the executor of the packages.

The location in which a package runs affects its access to files stored in the file system. A package that attempts to access a file stored on the C: drive of a server can fail if that package is executed from a client computer, unless the necessary file is also located on the C: drive of the client computer.

Security context

Package security context is based on the executor of the package.

  • If a package is executed by a SQL Server job, the package (and each of its child packages) runs in the security context of the owner of the job.
    • If the owner of a SQL Server job is not a member of the sysadmin server role in the SQL Server instance where the job runs, the package executed by the job runs in the security context of the account that is set up as the SQL Agent Proxy Account. By default, this account is not enabled and only members of the sysadmin server role can execute CmdExec and ActiveScripting job steps. If the package contains these job steps and the proxy account has either not been configured or does not have sufficient access rights, the package fails.
      Note   If a package runs in the security context of the SQL Agent Proxy Account, the proxy account must have read and write permissions to the Temp directory of the SQL Server Agent service account. For more information, see Q269074: INF: How to Run a DTS Package as a Scheduled Job.
    • If the owner of a SQL Server Agent job is a member of the sysadmin server role in the SQL Server instance where the job runs, the package executed by the job runs in the security context of the SQL Server Agent service account.
  • If you run a package manually by using the DTSRun.exe command line utility, the package runs in the security context of the user who is logged on to the client computer from which the package is executed.
  • If an Execute Process task executes a package by using the DTSRun.exe command line utility, the package runs in the security context of the parent package that is running the Execute Process task.
  • If a package is executed through the xp_cmdshell extended stored procedure, the security context of the package depends on the security context of the user who runs xp_cmdshell:
    • If the user is not a member of the sysadmin server role in the SQL Server instance where the job runs, the package executed by the job runs in the security context of the account that is set up as the SQL Agent Proxy Account.
    • If the user is a member of the sysadmin server role in the SQL Server instance where the job runs, a package executed by the job runs in the security context of the SQL Server service account.

Ensure that when a package attempts to access the file system, the files are available from the computer on which the package runs and that the files are accessible based on the security context in which the package executes.

Design Packages for Data Throughput or Package Flexibility

Design packages for either data throughput or package flexibility, but not for both. For example, when designing packages to bulk insert data into a staging database, design the packages to maximize data throughput. On the other hand, when designing packages to transform the data moving through the packages, design the packages for flexibility, maximizing performance when possible.

Maximizing data throughput

When trying to maximize data throughput through the packages in a DTS data movement system, follow these guidelines:

  • Use the Bulk Insert task, which uses the capabilities of the Transact-SQL Bulk Insert statement. This task imports data, without transformation, faster than any other task. Using the Bulk Insert task allows you to import the data quickly from the RDBMS system into the Staging database, minimizing the performance impact on the RDBMS and maximizing data throughput into the Staging database. You can then use Transform Data and Data Driven Query tasks to perform any needed transformations as you move the data into the Subject Matter database.
  • Minimize logging by setting the SQL Server destination database to use either the Bulk-Logged Recovery or the Simple Recovery model. These recovery models are faster than the Full Recovery model. The Simple Recovery model yields the fastest performance.
  • Perform all tasks on one server when possible to minimize delays caused by network performance issues.
  • Minimize the use of indexes on the destination tables because inserts, updates, and deletes to tables containing indexes require index maintenance by SQL Server as the data changes occur. Performing index maintenance while data is being imported slows overall throughput. The BI Accelerator data movement system uses no indexes on the fact tables in the Staging and Subject Matter databases to maximize insert performance. If you maintain an index on a table into which you insert data, ensure that the table has zero rows to maximize performance.
  • Minimize the use of ActiveX scripts, Data Driven Query tasks, and Transform Data tasks. Performing data transformations and lookups while data is being imported takes time and slows overall throughput.
  • Ensure that the target table is not being replicated and does not contain triggers. Executing triggers and replicating rows as they are imported requires server resources and slows overall SQL Server performance.
  • Use the TABLOCK hint to ensure that a table-level lock is taken for the duration of the bulk-copy operation.

For an example of a master package that is optimized for throughput, see the Master Import package in the BI Accelerator data movement system.

Maximizing flexibility

When maximizing the flexibility of the packages to perform data transformations, follow these guidelines:

  • Use Transform Data tasks to perform stock transformations with maximum performance.
  • Use ActiveX tasks for maximum flexibility when performing transformations, but at the expense of performance.
  • Use Data Driven Query tasks to evaluate data on a row-by-row basis and use set-processing to generate the source rowset to increase the performance of the Data Driven Query task.

For an example of a master package that is optimized for flexibility, see the Master Update package in the BI Accelerator data movement system.

Conclusion

You can design and create a DTS data movement system that is easy for developers to understand and extend, and that is easy for DBAs to understand and manage. Understanding the solutions to the problems faced by the BI Practices team can help you improve your DTS data movement system. Use the tips, tricks, and guidelines outlined in this paper to help you design your system for usability, performance and manageability.

Finding More Information

For additional information about BI Accelerator and DTS, see the following resources:

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft