Chapter 21 - Monitoring the DTS Multiphase Data Pump in Visual Basic
The purpose of this chapter is to illustrate a technique for monitoring the phases of the Data Transformation Services (DTS) data pump process from an application external to the DTS package. This chapter may aid in both the understanding of multiphase data pump behavior and in troubleshooting. It includes a sample solution that graphically depicts phase activity and allows you to set phase breakpoints during the execution of a package that contains a Transform Data task or a Data Driven Query task. This solution uses a COM+ event class to communicate to a Microsoft® Visual Basic® application.
This chapter assumes that you have some experience setting up Microsoft ActiveX® script transformations for multiphase data pump processes. This chapter also assumes that the multiphase data pump has been exposed from within SQL Server Enterprise Manager. If necessary, please review SQL Server Books Online before proceeding further.
Exposing the Multiphase Data Pump
To understand the architecture of the sample solution presented in this chapter, it is important to understand what multiphase data pump programming interfaces are available, the context in which a DTS package executes, and some rationale for developing the sample solution.
The custom transform interface that allows access to each phase of the data pump is only exposed natively through COM in Microsoft Visual C++®. But a full automation interface to the data pump is available to Visual Basic.
Elements of the multiphase data pump are exposed within the ActiveX script transformation. A data structure that indicates phase information (for example, transform status, source and destination row counts, and error information) is passed into each of these phases.
Package Execution Context
DTS packages are always run as a client-side process. The context of the running package depends on the calling application. If a user called the package through an application such as DTS Designer or DTS Run, the package runs in the context of that user account. If a user schedules a package as part of a SQL Server Agent job, then the package runs in the context of the SQLServerAgent service account or the designated non-sysadmin job step proxy account. For more information, search for Knowledge Base article 266663 at http://support.microsoft.com/directory/.
Because DTS is free threaded, any COM object that is called by a step or a task must support this threading model. If an object does not support this threading model, then the step utilizing it must be set to execute on the main thread. Custom tasks written in Visual Basic are forced by DTS to execute on the main package thread. Although packages that are free threaded can be created and executed in Visual Basic, any package created in Visual Basic that supports events must return those events to the main thread. Tasks, ActiveX scripts, and any objects instantiated within ActiveX script are usually executed as in-process objects within the DTS package.
Troubleshooting the Data Pump
By default, the data pump runs until it is complete or until a maximum number of errors are encountered. Identify what happened during package execution by examining error logs, error files, and exception files.
Because the progress indicator does not provide complete information about the state of the multiphase data pump, you may find it helpful to expose that information and set breakpoints for each phase, to step through each phase, and to monitor state and error information through a user interface.
Note You can expose a user interface while executing a package. For example, when a package is run from SQL Server Enterprise Manager, the Executing DTS Package dialog box displays the progress of package execution. However, it is not recommended that you embed a user interface into the package if your package will be run in an environment that does not support a user interface. For example, if you embed a user interface into a package that has been scheduled through SQL Server Agent, the package may stop responding.
Multiphase Data Pump Review
This section does not go into great detail about multiphase data pump functionality. Rather, it is provided as a review and as context for subsequent code examples. For more information about multiphase data pump functionality, see SQL Server Books Online.
Basic Multiphase Data Pump Process
When copying a row of data from source to destination, the data pump follows this basic process: individual rows of data are read from a data source into a batch buffer, transformed on a row-by-row basis, and written to a data destination. The different phases that the process goes through are controlled by the data pump architecture and the status and result of each phase, unless you override this behavior programmatically.
The data pump goes through many different phases, and each phase is meaningful only if you understand how the data pump arrived at that phase and what it will do after it leaves a phase. All this information is controlled by a combination of the transformation status and the phase status.
There are many variations on the status of a given row transformation phase. Status codes can be strictly informative, or they can be used to control the data pump. For more information on how these types of status codes can be combined, see "DTSTransformStatus" in SQL Server Books Online.
Default conversions (if any) or phase has succeeded or failed.
Call Error Sink
Instruct the data pump to call the error sink handler. It is possible to keep processing phases in spite of errors if the appropriate status code is set.
Variations of this status code instruct the data pump not to retrieve the next source row.
Variations of this status code instruct the data pump not to forward the row on to the destination data source.
Multiphase Data Pump Phases
The following diagram depicts a flow chart of the defined data pump phases, along with two additional implied phases. Directional lines show the possible flow between each phase, based on processing paths and transformation status.
The data pump does not have access to the data source and destination at all times. Source data operations are available in all phases from the Fetch Row phase until the Post Source phase. Destination data operations are not available during the Insert Success and Insert Failure phases. No data operations are available during the Pump Complete phase.
The Pre Source phase is executed before source data is fetched. A skip fetch status can be returned here to create a loop to be used for initialization or the polling of a data source.
You can add a pre-source data pump function to read or write information to a file, or to initialize objects, connections, and memory so they can be used during subsequent phases.
Fetch Source Row (Implied)
Although there is not an explicit phase defined for fetching a row, there is an implied event. This implied event can be seen by the branching paths that a data pump process can follow.
This is the only phase that is available in DTS Designer without your activating the multiphase data pump feature.
Note To activate this feature, select the Show multi-phase pump in DTS Designer check box. In this phase, a single row of source data is transformed and mapped to a destination. The transformation status returned from this phase influences subsequent branching.
This phase occurs when the Row Transform phase returns a status that is a variation on an error or exception status. The status returned from this phase controls subsequent branching.
This phase is referred to in SQL Server Books Online as a subphase of the Post Row Transform phase. The Post Row Transform phase is not a real phase, but rather a conceptual grouping of the phases that occur after each Row Transform phase. On the diagram, the Post Row Transform phase includes the boxes for Transform Success, Transform Failure, Insert Row, Insert Success, and Insert Failure.
Transform Success (Implied)
Although there is not an explicit phase defined to indicate a successful transformation, there is an implied event whenever the Transform Failure phase did not occur for a given row. It is listed on the flow chart for illustrative purposes only.
Insert Row (Implied)
Although there is not an explicit phase defined to indicate when a row is inserted, there is an implied event, which is seen by the branching paths that a data pump process can follow. This implied event would be skipped if the Row Transform status were set appropriately.
In the case of a Data Driven Query task, this event indicates the execution of a particular query, which is not necessarily an insert operation.
This phase signifies the success of an Insert operation, or the success of any query if the transformation is part of a Data Driven Query task. You cannot specify any destination operations in the returned status.
This phase signifies the failure of an Insert operation, or the failure of any query if the transformation is part of a Data Driven Query task. You cannot specify any destination operations in the returned status.
Source Row Processing Complete (Implied)
This event indicates that a single row has been completely processed and that the next row, if there is one, should be fetched.
This phase will occur at least once during the data pump process. If the destination connection is using the SQL Server OLE DB driver, if the Use Fast Load property is enabled, and if the Insert Batch Size property is set to anything besides zero, this phase will occur every time a batch of rows is flushed out to the destination data source. Otherwise, it occurs only at the end of the data transfer.
This phase occurs after all source data has been retrieved, transformed, and inserted, but before a final Batch Complete phase.
Note In the ActiveX Script Transformation Properties dialog box, under Entry Functions, the phases are listed out of order. Also, if you click AutoGen., the last two phases are listed out of order from their typical execution sequence. However, this will not affect the execution of these phases.
This is the last phase to occur. Neither source nor destination data is available. It is generally used for cleanup operations.
Properties that Impact Phases
The following properties, which can be manipulated either programmatically or graphically through DTS Designer, establish parameters for the data pump process. For more information, see SQL Server Books Online.
Use Fast Load
This property specifies that the data pump should use high-speed bulk-load processing. Batches are processed by the destination as a group instead of row by row. This property is only available on the Transform Data task.
Insert Batch Size
This property controls the number of destination records that are committed as a single batch. Note: If a record fails, it is not counted as a member of the batch. This property only applies when the destination connection is using the Microsoft OLE DB Provider for SQL Server, and the Use Fast Load property has been set. This property is only available on the Transform Data task.
Max Error Count
This property sets a limit on the number of unhandled errors that can occur within the data pump process before the task terminates. It is possible to intercept different types of transformation errors and incorporate custom error handling, thereby avoiding the limit set by this property.
Sample Monitoring Solution
The following sample solution enables you to monitor the phases of the data pump from an external client application while the package is running.
The sample code for this solution is divided into a COM+ Event Class, a Visual Basic Project for publishing events, a Visual Basic Project for subscribing to events, and a DTS package. The sample code is located on the Microsoft SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\MonitorDTS. For the code to work, the appropriate DLLs and EXE need to be copied and registered. A batch file is included in the sample code folder which registers the DLLs, but the Component Services registration needs to be done manually, as specified below.
The sample solution was developed on the Microsoft Windows® 2000 platform, using SQL Server® 2000, and Visual Basic 6.0 (SP4).
The monitoring solution is based on the following ideas:
View Phases as Events
For the purpose of this application, the occurrence of and the circumstances leading into each phase can be thought of as an explicit event. Phases are actually handled programmatically as events in Visual C++.
ActiveX Scripting Calls
It is possible to create your own Visual Basic COM object and reference it from within the ActiveX scripting environment. Within an ActiveX script transformation, you can create scripting functions that are called at the end of each data pump phase. Each event function that is called from within the ActiveX script could in turn call the object and pass its state and status information to that object.
COM+ Managed Events
The crux of the monitoring problem revolves around handling communication and synchronization between client-side DTS packages and out-of-process monitoring applications. The sample solution will use Windows 2000 Component Services to forward and receive COM+ events for interprocess communications.
A COM+ event class will be created as an ActiveX DLL and registered with Component Services. That event class can then be referenced by other applications, making it possible for applications, with minimal code, to send and receive COM+ events through Component Services.
Visual Basic Publish and Subscribe Applications
Two applications, each of which interface with the COM+ event class, are needed. One application will relay information from the ActiveX scripting environment as a COM+ event. The other will listen for and implement the COM+ events through Component Services.
COM+ Event Class: MonitorDTSEvents.DLL
The sample Visual Basic project used to create the MonitorDTSEvents.DLL event class is located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\MonitorDTS\EventClass.
DTSTransformPhaseInfo Data Structure
The information relating to phase state and status is contained in a data structure called DTSTransformPhaseInfo. This includes the number of rows read, the number of rows successfully passed through to the destination, the number of errors encountered, the current phase, and the last transformation error status.
Creating the Event Class DLL
The COM+ event class is created as an ActiveX DLL called MonitorDTSEvents, with a description of "Monitor DTS Events." It is set for unattended execution and retained in memory.
Each event is a separate class and is only a placeholder function that defines the interface to the event. All parameters must be passed by value, as they will be copied to multiple applications by COM+.
For this application, a single event class called MPDPEvent is defined that accepts a user-defined parameter called MonitorID, the individual DTSTransformPhaseInfo fields as required parameters, and an optional variant field that will be used to pass an array of phase counters. The event class is defined as follows:
Public Sub MPDPEvent( _ ByVal MonitorID As Integer, _ ByVal CurrentPhase As Integer, _ ByVal CurrentSourceRow As Long, _ ByVal DestinationRowsComplete As Long, _ ByVal ErrorCode As Long, _ ByVal ErrorRows As Long, _ ByVal TransformStatus As Long, _ Optional ByVal PhaseOrdinal As Integer, _ Optional ByVal PhaseCounters As Variant) End Sub
Registering the DLL
The precompiled MonitorDTSEvents DLL can be registered by issuing the following command in the MonitorDTS\EventClass folder: regsvr32 MonitorDTSEvents.DLL. A batch file is provided in the sample code folder to do this. If you used the Visual Basic project to create the DLL, no further steps are necessary.
Registering the COM+ Component
The event class must be properly registered as part of a COM+ application within Component Services.
To register the newly created event class
Bring up the Component Services MMC console and navigate to the COM+ Applications folder
Right-click COM+ Application, point to New, and then click Application.
In the Welcome to the COM Application Install Wizard screen, click Next, and then click Create an empty application.
In the Create Empty Application screen, type MonitorDTS in the name box and then click Library Application.
Open the MonitorDTS application, right-click Components, point to New, and then click Component.
In the Import or Install Component Screen, click Install new event class(es).
In the Select Files to Install screen, find and open MonitorDTSEvents.dll.
In the Install new event class screen, select the Find MonitorDTSEvents.dll check box.
Right-click MonitorDTSEvents.MPDP Event, click Properties, click the Advanced tab, and then click Fire in parallel.
Selecting this check box permits multiple applications to access each event independently of one another.
Publisher Application: MonitorDTS.DLL
The sample Visual Basic project used to create the publisher application MonitorDTS.DLL is located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\MonitorDTS\Publisher.
A method called PhaseRelay in the publisher application will be called by ActiveX script. The publisher needs to identify which phase the data pump is in, manage an array of phase counters, and only process events with the same MonitorID that was first used with a given instance of the publisher. If appropriate, it will fire a COM+ event.
You can step through this process, or you can access the provided code samples. The publisher application is structured in the following way.
Creating the Publisher Application
The publisher application is created as an ActiveX DLL called MonitorDTS, with a description of "Monitor DTS Publisher." This application has its project properties set for unattended execution, and is retained in memory. A project reference to the Monitor DTS Event object library (MonitorDTSEvents.dll) is enabled.
The publisher application is implemented entirely within a class called PhaseRelay. This class processes the method call from the ActiveX script, and in turn fires the MPDPEvent COM+ event.
Registering the DLL
The precompiled MonitorDTS DLL can be registered by issuing the following command in the MonitorDTS\Publisher folder: regsvr32 MonitorDTS.DLL. A batch file is provided in the sample code folder to do this. If the reader used the Visual Basic project to create the DLL, then no further steps are necessary.
Initializing the PhaseRelay Class
The MonitorID for a given instance is not defined until the first time the PhaseRelay method is called.
An array called mPhaseArray is defined which contains the expected allowable return values for CurrentPhase. The ordinal position of each value in the array will later be used to reference the phase.
An array is defined that contains counters for each observable phase. The mPhaseArray array will be used to look up the ordinal position that correlates to the position of the counter for a given phase. Each value in the counter array is set initially at zero.
An instance of the MPDPEvent COM+ event class called objDTSMonitor is created when the PhaseRelay class is instantiated. No other coding is required, as Component Services handles all of the communications.
Identifying the Phase
The MonitorID parameter value for a given event is checked against the value used when the object was instantiated. If the values do not match, no further handling is done for the event.
The array is searched for a value that matches the CurrentPhase value for a given phase, and the ordinal position within the array is used to identify that phase.
If the phase value indicates a completed batch, then the counters for the Row Transform and Transform Failure phases are reset to zero for each batch.
Phase counters are maintained within the publisher application to record how many times the event was fired. They are not maintained in the subscriber application since the subscriber may not observe all of the posted events, depending upon when subscriber was launched.
Firing the Event
The MPDPEvent COM+ event is fired, relaying the information sent from the ActiveX script, with the addition of the optional phase counter array and ordinal position.
Component Services relies on applications to register and deregister themselves properly. If an application does not do this properly, then an error will be propagated back to the publishing applications that are sending events. An error handler is put in place that generically detects any error encountered during the event call and attempts to remedy the error by simply disconnecting and reconnecting to the COM+ event.
Setting the objDTSMonitor object to nothing cleanly shuts down the publisher application's Component Services event handling.
Subscriber Application: MonitorDTSWatch.EXE
The sample Visual Basic project used to create the subscriber application MonitorDTSWatch.EXE is located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\MonitorDTS\Subscriber.
The Visual Basic ActiveX executable application will register itself at run-time with Component Services, and implement the MPDPEvent COM+ events that are fired by the publisher application. It will display the information relayed by the COM+ events on a form as the DTS package runs in the background.
You can step through this process, or you can access the provided code samples. The subscriber application is structured in the following way.
Creating the Subscriber Application
The subscriber application is created as an ActiveX executable called MonitorDTSWatch, with a description of "Monitor DTS Watcher". This application has its project properties set for unattended execution, for is retained in memory. Its startup object is set to Sub Main, and its start mode is set to standalone.
A project reference to the Monitor DTS Event object library (MonitorDTSEvents.dll) is enabled, as well as one to the COM+ 1.0 Admin Type Library (comadmin.dll).
Initialize the global array containing verbose phase information. The ordinal position of each phase in the array correlates to the ordinal position determined for the phase counters in the publishing application. The actual CurrentPhase values are in the array in ascending order.
Initialize the global array containing the verbose transformation status information. The actual TransformationStatus values are in the array in ascending order.
Declare a public constant that contains the Component Services event class ID GUID for the registered phase info COM+ event class. This value needs to be copied and pasted from the properties of the event with Component Services.
Initializing the Subscriber Class
Upon form load, create a transient subscription with Component Services as an implementer of the MPDPEvent COM+ event, using the event class ID GUID defined earlier. For more information about this technique, see Knowledge Base article #250292 at http://support.microsoft.com/directory/.
Receive the MPDPEvent COM+ event, and discard the message if it contains a MonitorID different than the one intended for the form. This prevents double counting of events from different publishers.
Pass the parameters received from the COM+ event to a form subroutine that posts the information to a form.
The MonitorDTSWatch main form graphically contains a flow chart of the defined data pump phases, along with two additional implied phases. Directional lines show the possible flow between each phase.
The phases themselves are represented by an array of CheckBox controls displayed in graphical style. The ordinal position of each button in the control array corresponds to the ordinal position in the global array containing verbose phase information.
The caption of each CheckBox control is updated with the name of the phase, appended by the current counter value for that respective phase.
The color of the CheckBox control representing the current phase is set to green. The color of the control representing the previous phase is set to orange. You should be able to discern what happened in between each phase by tracing the path of the directional lines between the last phase and the current phase.
Phase breakpoint functionality is built into the form. If the value of a given CheckBox control is set to checked (depressed), then the application will pause the next time it receives an event for that CheckBox's phase. It will then display a separate modeless form detailing the transform phase information at that point in time, and temporarily halt the data pump. While this information is displayed, it is possible to go back to the main form and check or uncheck any of the CheckBox controls. The important implication here is that the event handling is blocking, meaning that the data pump cannot proceed until the posting subroutine completes.
It is necessary to deregister the subscriber application from Component Services before shutting down. This is essentially the reverse of the procedure used to register the application during the loading of the main form. Because a COM+ related error handler was incorporated in the publisher application, abruptly shutting down the subscriber will not affect the publisher. The COM+ event handler for the subscriber application will eventually time out, but as a matter of practice you should deregister the application in an orderly fashion.
DTS Package: MonitorDTS Sample.DTS
To illustrate how the monitoring code works, we will create a simple DTS package that uses a Transform Data task and the Publisher Visual Basic application to relay phase information to a Subscriber Visual Basic application by way of COM+ events.
You can step through this process, or you can access the provided code samples.
Creating the DTS Package
Open and review the sample DTS package called 'MonitorDTS Sample.dts', which is located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\MonitorDTS\DTS Package. Alternatively, to set up the DTS package:
Activate the multiphase data pump in DTS Designer. For detailed instructions, see SQL Server Books Online.
Create a simple DTS package that consists of a SQL Server connection to the Northwind database and a SQL Server connection to the tempdb database and a transformation between the two. The source table for the transformation should be the Northwind.dbo.Orders table.
Set the destination to create an identical table structure called Orders2.
Delete all existing transformations, select all fields, and create a single new ActiveX script transformation.
Set the Insert Batch Size to 250 records.
Save the script, task, and package.
Setting up the ActiveX Script Transformation
Copy and paste the sample ActiveX script transformation, which implements the multiphase functions. The file that contains this script is called 'MPDP Transform.txt' and is located on the SQL Server 2000 Resource Kit CD-ROM in the folder, \ToolsAndSamples\MonitorDTS\DTS Package. Alternatively, to set up the data pump script: Go into the properties of the new transformation to edit its ActiveX script.
Click the Phases tab, and select the check boxes for all eight phases that are shown.
Click AutoGenerate to create Visual Basic script placeholder event functions for each phase. Alternatively, copy a version of the completed transformation script from the file called 'MPDP Transform.txt', which is located on the SQL Server 2000 Resource Kit CD-ROM in the above mentioned folder.
At the top of the script, outside of any of the functions, add the following code. This code creates a global function called PhaseRelay. The PhaseRelay function accepts a parameter that is to be used as a user-defined monitor ID. The script should look like:
Dim oMonitor Function PhaseRelay( MonitorID ) 'Set Status Info within Monitor object & process the event oMonitor.PhaseRelay MonitorID _ ,DTSTransformPhaseInfo.CurrentPhase _ ,DTSTransformPhaseInfo.CurrentSourceRow _ ,DTSTransformPhaseInfo.DestinationRowsComplete _ ,DTSTransformPhaseInfo.ErrorCode _ ,DTSTransformPhaseInfo.ErrorRows _ ,DTSTransformPhaseInfo.TransformStatus End Function
Inside of the Pre Source phase entry function, create a new instance of the DTSMonitor.DLL publisher application, set the phase status to OK, and call the PhaseRelay function:
Function PreSourceMain() 'Create the monitor set oMonitor = CreateObject("MonitorDTS.PhaseRelay") PreSourceMain = DTSTransformStat_OK PhaseRelay(1) End Function
Inside of the Pump Complete phase entry function, relay the phase information passed into the function, preserve the transform status, and clean up and shut down the Publisher application:
Function PumpCompleteMain() PhaseRelay(1) 'Pass on the TransformStatus PumpCompleteMain = DTSTransformPhaseInfo.TransformStatus 'Release the monitor object set oMonitor = nothing End Function
At the end of the Row Transform phase entry function, just before the function ends, call the PhaseRelay function to report the row transform status that the script determined for that phase:
Function RowTransformMain() ... RowTransformMain = DTSTransformStat_OK PhaseRelay(1) End Function
Inside of the Insert Success and Insert Failure phase entry functions, relay the phase information at the beginning of the function, and set the phase status to OK:
Function InsertFailureMain() PhaseRelay(1) InsertFailureMain = DTSTransformStat_OK End Function '------------------------------------------------------- Function BatchCompleteMain() PhaseRelay(1) 'Pass on the TransformStatus BatchCompleteMain = DTSTransformPhaseInfo.TransformStatus End Function
Inside of the other phase entry functions, relay the phase information passed into each function, and preserve the transform status:
Function TransFailureMain() PhaseRelay(1) 'Pass on the TransformStatus TransFailureMain = DTSTransformPhaseInfo.TransformStatus End Function '------------------------------------------------------- Function BatchCompleteMain() PhaseRelay(1) 'Pass on the TransformStatus BatchCompleteMain = DTSTransformPhaseInfo.TransformStatus End Function '------------------------------------------------------- Function PostSourceMain() PhaseRelay(1) 'Pass on the TransformStatus PostSourceMain = DTSTransformPhaseInfo.TransformStatus End Function
Executing the Solution
To step through the solution presented above, follow these steps:
Execute the DTS package and verify that it completes and processes 830 rows without error.
Start up the subscriber application by executing MonitorDTSWatch.exe. The form with the flow diagram should come up with no activity. Select the phases for which you would like to set initial breakpoints.
Execute the DTS package and monitor the activity using MonitorDTSWatch.exe. Use the breakpoints to examine phase information and transformation status. Trace the process flow lines between the current phase, which is green, and the previous phase, which is orange.
Start another instance of the subscriber application. Notice how multiple subscribers can monitor and implement COM+ event calls from a single publisher.
There are many opportunities for enhancing this solution. For example:
Create other COM+ events. The architecture now in place can easily support different types of events. Consider creating separate events or handlers for different transformation status or error codes.
Record the phase information to a journal file or database table for later examination. Incorporate that feedback into a new event handler that replays a data pump process.
Push data fields out across events or selected rows. Recycle the data out-of-process from DTS.
Create a centralized monitor that could easily receive events from multiple monitors.
Incorporate Microsoft Messaging Queuing into the publisher and subscriber applications to enable asynchronous monitoring.