Designing Package Data Flow
The data flow in a Integration Services package is constructed by using different types of data flow elements: sources that extract data, transformations that modify and aggregate data, destinations that load data, and paths that connect the outputs and inputs of data flow components into a data flow.
You create the data flow in a package by using the data flow designer, the design surface on the Data Flow tab in SSIS Designer.
Creating a data flow includes the following steps:
Adding one or more sources to extract data from files and databases.
Adding the transformations that meet the business requirements of the package. A data flow is not required to include transformations.
Connecting data flow components by connecting the output of sources and transformations to the input of transformations and destinations.
Adding one or more destinations to load data into data stores such as files and databases.
Configuring error outputs on components to handle problems, such as errors or data values that are truncated.
If the data flow includes components that connect to data sources, you must also add connection managers to the package. You can add connection managers while working in the data flow designer, but you can also add them when the Control Flow or Event Handlers tab is active.
When you create a new package, you can also use a wizard to help you configure connection managers, sources, and destinations correctly. For more information, see Creating a Package by Running a Wizard.
SSIS Designer also includes annotations that you can use to make the data flow self-documenting.
When the Data Flow tab is active, SSIS Designer shows the design surface for creating the data flows in a package, and the Connection Managers area for adding the connection managers that the package uses. Meanwhile, the Toolbox changes to contain nodes for Data Flow Sources, Data Flow Transformations, and Data Flow Destinations.
The following diagram shows the data flow of a simple package in the data flow designer. The data flow shown in the diagram consists of a source with a regular output and an error output, two transformations, and two destinations.
Integration Services provides different sources for extracting data from different types of data sources. Using an Integration Services source, you can extract data from flat files, XML files, Microsoft Excel workbooks, and files that contain raw data. You can also extract data by accessing tables and views in databases and by running queries. For more information, see Integration Services Sources and Extracting Data with Sources.
Sources use connection managers to connect to data sources. You can add and configure a connection manager when you configure the source, or you can add the required connection managers to the package before you start to construct the data flow.
Integration Services provides a variety of transformations for modifying data, performing business intelligence operations, and for splitting, copying, and merging data. Using an Integration Services transformation, you can modify values in columns, look up values in tables, clean data, and aggregate column values. For more information, see Integration Services Transformations and Transforming Data with Transformations.
Some transformations use connection managers. For example, the Lookup transformation uses a connection manager to connect to the database that contains the lookup data. You can add and configure a connection manager when you configure the transformation, or you can add the required connection managers to the package before you start to construct the data flow.
Integration Services provides different destinations for loading data into different types of data stores. Using an Integration Services destination, you can load data into flat files, process analytic objects, and provide data to other processes. You can also load data by accessing tables and views in databases and by running queries. For more information, see Integration Services Destinations and Loading Data with Destinations.
Destinations use connection managers to connect to data sources. You can add and configure a connection manager when you configure the destination, or you can add the required connection managers to the package before you start to construct the data flow.
You connect data flow components by connecting the output of sources and destinations to the input of transformations and destinations. When constructing a data flow you typically connect the second and subsequent components as you add them to the data flow. After you connect the component, the input columns are available for use in configuring the component. When no input columns are available, you will have to complete the configuration of the component after it is connected to the data flow. For more information, see Integration Services Paths and Connecting Components with Paths.
Many Integration Services sources, transformations, and destinations support error outputs. If the data flow component supports an error output, you can specify the effects that truncations or errors in each column have on the run-time behavior of the component. The error output can be connected to transformations that apply additional transformations or direct data to a different destination. For more information, see Handling Errors in the Data Flow.
Many data flow components connect to data sources, and you must add the connection managers that the components require to the package before the component can be configured correctly. You can add the connection managers as you construct the data flow, or before you start to construct the data flow. For more information, see Integration Services Connections and Adding Connection Managers.
SSIS Designer includes annotations that you can add to a data flow. Adding annotations to the design surface helps make packages self-documenting. For more information, see Using Annotations in Packages.
Presentation, How Microsoft IT Leverages SQL Server 2008 SSIS Dataflow Engine Enhancements, on sqlcat.com.