Extract, Transform, and Load (DW)---a Technical Reference Guide for Designing Mission-Critical DW Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

Microsoft SQL Server Integration Services (SSIS) is an ETL platform for enterprise-level data integration and data transformation solutions (see SQL Server Integration Services1). Enterprise class development is done via Microsoft Visual Studio, introducing Integration Service Projects as a part of BI solutions. Ad-hoc data imports and exports can be facilitated via SSIS Export/Import Wizard (see Using the SQL Server Import and Export Wizard to Move Data2).

Best Practices

The following resources provide reference material and additional information.

Case Studies and References

Examples of successful architectures are described in the following case studies and white papers:

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

  • Determine all connectivity requirements (e.g. connection managers, OLEDB, ODBC, JDBC, and so on).

  • Build a "sandbox" source system to investigate source system data more effectively.

  • Establish source version control. Ideally use Microsoft Team Foundation Server. Regardless of the tool used for version control, all versions to SSIS packages need to be archived and accessible in the same manner as all other code in a BI solution.

  • Introduce Development, Testing, Staging, and Production environments. Make sure versions of SQL server are kept in sync across environments. Implement change control process for ETL components when promoting from Development to Production. In deployment plans always account for rollback procedures.

  • Build source to target mapping document including relevant business rules for data transformations. Plan this aspect carefully. Volatile source systems (those that change frequently) will require planned effort to keep mapping up-to-date. (See BI Documenter.17)

  • Will data profiling be required? If so, use SSIS Data Profiling Task.18

  • Implement SSIS configuration management using combination of environment variables, XML configuration files, and SQL Server configurations. References included on page 38 of SQL Server 2008 Integration Services Problem—Design—Solution book.

  • Will you have to develop a custom ADO.NET provider to access the source system?

  • Establish a solid naming convention for SSIS packages, and for SSIS components. Consistency in implementing naming conventions is very important for SSIS auditing and reporting on lineages, as well as for reusability of .dtsx files among developers.

  • Create SSIS package "templates" with standard variables used by all ETL jobs.

  • Make sure to use the "OnPostExecute" event handler in the SSIS packages.

  • Define strategy for deployment and storage. Reference is in Chapter 3 (page 75) of SQL Server 2008 Integration Services Problem—Design—Solution book.

  • Is there any de-duplication of data required?

  • What is the default strategy for a package error handler?

  • How will changes to dimension tables be handled?

  • Understand source database design (Star Schema, normalized, packaged).

  • Determine load sequence, frequency, how much data history.

  • Determine partitioning strategy for large tables.

Appendix

Following are the full URLs for the hyperlinked text.

1 SQL Server Integration Services https://msdn.microsoft.com/en-us/library/ms141026.aspx

2 Using the SQL Server Import and Export Wizard to Move Data https://msdn.microsoft.com/en-us/library/ms141209.aspx

3 The Data Loading Performance Guide https://msdn.microsoft.com/en-us/library/dd425070.aspx

4 Project REAL: Business Intelligence ETL Design Practices https://technet.microsoft.com/en-us/library/cc966422.aspx

5 DWMantra.com http://www.dwmantra.com/dwconcepts.html

6 Project REAL—Business Intelligence in Practice https://www.microsoft.com/sqlserver/2005/en/us/project-real.aspx

7 Stonemeadow Solutions ETL Framework http://etlframework.codeplex.com/

8 Top 10 SQL Server Integration Services Best Practices http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

9 Designing Your SSIS Packages for Parallelism (SQL Server Video) https://msdn.microsoft.com/en-us/library/dd795221.aspx

10 SQL Server Integration Services – SSIS http://www.sqlis.com

11 SQL Server Integration Services (SSIS) – Best Practices http://www.mssqltips.com/tip.asp?tip=1840

12 Architecture of Integration Services https://msdn.microsoft.com/en-us/library/bb522498.aspx

13 Planning the SQL Server ETL implementation strategy using SSIS for Extractshttp://www.mssqltips.com/tip.asp?tip=1923

14 SSIS architecture: Tips for package design http://www.infoworld.com/d/data-management/ssis-architecture-tips-package-design-530

15 Designing and Tuning for Performance your SSIS packages in the Enterprise (SQL Video Series) http://sqlcat.com/presentations/archive/2009/05/02/designing-and-tuning-for-performance-your-ssis-packages-in-the-enterprise-sql-video-series.aspx

16 SQL Server Integration Services Product Samples http://msftisprodsamples.codeplex.com

17 BI Documenter http://bidocumenter.com/Public/Screenshots.aspx#

18 Data Profiling Task https://msdn.microsoft.com/en-us/library/bb895263.aspx