Extract, Transform, and Load (DW)---a Technical Reference Guide for Designing Mission-Critical DW Solutions
Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions. |
Microsoft SQL Server Integration Services (SSIS) is an ETL platform for enterprise-level data integration and data transformation solutions (see SQL Server Integration Services1). Enterprise class development is done via Microsoft Visual Studio, introducing Integration Service Projects as a part of BI solutions. Ad-hoc data imports and exports can be facilitated via SSIS Export/Import Wizard (see Using the SQL Server Import and Export Wizard to Move Data2).
Best Practices
The following resources provide reference material and additional information.
Considerations for High Volume ETL Using SQL Server Integration Services 4
Project REAL:
Establish ETL framework. Same patterns need to be used in packages for loading data into staging, for manipulating data in operational data layer, for dimension, and fact loads as well. This framework will also provide for including ETL lineages and data auditing as it gets loaded and transformed from sources to destinations.
SSIS Framework design is detailed in SQL Server 2008 Integration Services Problem—Design—Solution book on page 33. Another helpful reference to examples of SSIS ETL framework is Stonemeadow Solutions ETL Framework.7
Designing Your SSIS Packages for Parallelism (SQL Server Video) 9
SQL Server Integration Services – SSIS 10 contains multiple links to specific SSIS topics with samples.
Planning the SQL Server ETL implementation strategy using SSIS for Extracts 13
Designing and Tuning for Performance your SSIS packages in the Enterprise (SQL Video Series) 15
Case Studies and References
Examples of successful architectures are described in the following case studies and white papers:
Questions and Considerations
This section provides questions and issues to consider when working with your customers.
Determine all connectivity requirements (e.g. connection managers, OLEDB, ODBC, JDBC, and so on).
Build a "sandbox" source system to investigate source system data more effectively.
Establish source version control. Ideally use Microsoft Team Foundation Server. Regardless of the tool used for version control, all versions to SSIS packages need to be archived and accessible in the same manner as all other code in a BI solution.
Introduce Development, Testing, Staging, and Production environments. Make sure versions of SQL server are kept in sync across environments. Implement change control process for ETL components when promoting from Development to Production. In deployment plans always account for rollback procedures.
Build source to target mapping document including relevant business rules for data transformations. Plan this aspect carefully. Volatile source systems (those that change frequently) will require planned effort to keep mapping up-to-date. (See BI Documenter.17)
Will data profiling be required? If so, use SSIS Data Profiling Task.18
Implement SSIS configuration management using combination of environment variables, XML configuration files, and SQL Server configurations. References included on page 38 of SQL Server 2008 Integration Services Problem—Design—Solution book.
Will you have to develop a custom ADO.NET provider to access the source system?
Establish a solid naming convention for SSIS packages, and for SSIS components. Consistency in implementing naming conventions is very important for SSIS auditing and reporting on lineages, as well as for reusability of .dtsx files among developers.
Create SSIS package "templates" with standard variables used by all ETL jobs.
Make sure to use the "OnPostExecute" event handler in the SSIS packages.
Define strategy for deployment and storage. Reference is in Chapter 3 (page 75) of SQL Server 2008 Integration Services Problem—Design—Solution book.
Is there any de-duplication of data required?
What is the default strategy for a package error handler?
How will changes to dimension tables be handled?
Understand source database design (Star Schema, normalized, packaged).
Determine load sequence, frequency, how much data history.
Determine partitioning strategy for large tables.
Appendix
Following are the full URLs for the hyperlinked text.
1 SQL Server Integration Services https://msdn.microsoft.com/en-us/library/ms141026.aspx
2 Using the SQL Server Import and Export Wizard to Move Data https://msdn.microsoft.com/en-us/library/ms141209.aspx
3 The Data Loading Performance Guide https://msdn.microsoft.com/en-us/library/dd425070.aspx
4 Project REAL: Business Intelligence ETL Design Practices https://technet.microsoft.com/en-us/library/cc966422.aspx
5 DWMantra.com http://www.dwmantra.com/dwconcepts.html
6 Project REAL—Business Intelligence in Practice https://www.microsoft.com/sqlserver/2005/en/us/project-real.aspx
7 Stonemeadow Solutions ETL Framework http://etlframework.codeplex.com/
8 Top 10 SQL Server Integration Services Best Practices http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx
9 Designing Your SSIS Packages for Parallelism (SQL Server Video) https://msdn.microsoft.com/en-us/library/dd795221.aspx
10 SQL Server Integration Services – SSIS http://www.sqlis.com
11 SQL Server Integration Services (SSIS) – Best Practices http://www.mssqltips.com/tip.asp?tip=1840
12 Architecture of Integration Services https://msdn.microsoft.com/en-us/library/bb522498.aspx
13 Planning the SQL Server ETL implementation strategy using SSIS for Extractshttp://www.mssqltips.com/tip.asp?tip=1923
14 SSIS architecture: Tips for package design http://www.infoworld.com/d/data-management/ssis-architecture-tips-package-design-530
15 Designing and Tuning for Performance your SSIS packages in the Enterprise (SQL Video Series) http://sqlcat.com/presentations/archive/2009/05/02/designing-and-tuning-for-performance-your-ssis-packages-in-the-enterprise-sql-video-series.aspx
16 SQL Server Integration Services Product Samples http://msftisprodsamples.codeplex.com
17 BI Documenter http://bidocumenter.com/Public/Screenshots.aspx#
18 Data Profiling Task https://msdn.microsoft.com/en-us/library/bb895263.aspx