Microsoft Data Sources (DW)---a Technical Reference Guide for Designing Mission-Critical DW Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

Microsoft SQL Server often serves as the source of both transactional (OLTP) data as well as existing data warehouse data. Key distinguishing features primarily focus on the connectivity to SQL Server as well as the geographical proximity of the SQL Server source to the Data Warehouse target. Other important items include volume, frequency of extract/load, complexity of data transformation, access windows to extract data. In addition to SQL Server, Microsoft Excel is often used as a source for data warehouses.

Best Practices

The following resources provide reference material and additional information.

Case Studies and References

Questions and Considerations

When dealing with Microsoft data sources following important points should be taken in consideration.

  • Determine accessibility to all data sources:

    • Are they online? If so, what is the latency (how frequently updated)? Are there "windows" of time that you will have to consider to access them?

    • Are there any access security restrictions? Is access to SQL server provided via Windows or SQL authentication? If account used to access source data has privileges greater than reader ensure that security policies are defined and in place to prevent corruption of source data.

    • How "load-ready" is the data (i.e. how much transformation/enrichment is required for loading to DW)?

  • Determine all connectivity requirements (e.g. connection managers, OLEDB, ODBC, JDBC, and so on).

  • Make sure that development, testing, staging, and production environments are established for proper development and deployment. For more detail please refer to TARG document ETL SSIS.

  • Ensure that versions, service packs, and hotfixes for operating system and SQL Server are identical across environments.

  • If SQL Server Destination in SSIS is used to bulk load data into SQL Server tables and views, keep in mind that SQL Server destination cannot be used in packages that access a SQL Server database on a remote server. In this case, packages should use OLE DB destination.

  • Understand source database design (Star Schema, or normalized).

  • Understand if source and destination operating systems and database engines run 32 or 64 bit versions of operating systems and SQL engine. This however, shouldn’t be an issue if directions under point 4 are implemented. For more detail please refer to "64 bit Considerations for Integration Services.10

  • Understand if the client will rely on Data Extract (e.g. BCP) versus ETL/ELT (e.g. SSIS).

  • For BCP Extracts, determine where data will land:

  • Determine ETL/ELT requirements with regards to:

    • Data volume

    • Impact of large data volumes to corporate network?

    • Data "latency" or timing (how frequently does data need to be loaded)

    • Transformation complexity (from simple table lookups to complex business-oriented /data enrichment transforms)

  • Determine geographical proximity of SQL Server source and target systems.

  • Understand characteristics of the target (is it another SQL Server instance, or a flat file, or another RDBMS or analytical platform (e.g. SSAS)?).

  • Consider EDW architecture to include Staging (Landing) database, Operation Data Layer, Data Marts, and EDW. Also, assess if it is appropriate to manage common dimensions in a separate database (Conformed Dimensions). For more detail refer to slide 58 – ETL SSIS.

  • For Excel sources consult the following MSDN references: How to: Connect to an Excel Workbook12 and Excel Source.13

Appendix

Following are the full URLs for the hyperlinked text.

1 bcp Utility https://msdn.microsoft.com/en-us/library/aa174646(v=sql.80)

2 Using the SQL Server Import and Export Wizard to Move Data https://msdn.microsoft.com/en-us/library/ms141209.aspx

3 Architecture of Integration Services https://msdn.microsoft.com/en-us/library/bb522498.aspx

4 Top 10 SQL Server Integration Services Best Practices http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

5 Bulk Loading Data into a Table with Concurrent Queries http://sqlcat.com/technicalnotes/archive/2009/04/06/bulk-loading-data-into-a-table-with-concurrent-queries.aspx

6 Extract, transform, load http://www.answers.com/topic/extract-transform-load

7 How to: Loop through Excel Files and Tables by Using a Foreach Loop Container https://msdn.microsoft.com/en-us/library/ms345182.aspx

8 SSIS and Data Sources https://social.technet.microsoft.com/wiki/contents/articles/ssis-and-data-sources.aspx

9 Data Source (SSIS) https://msdn.microsoft.com/en-us/library/ms141792.aspx

10 64-bit Considerations for Integration Services https://msdn.microsoft.com/en-us/library/ms141766.aspx

11 ETL Subsystem 31: Paralleling and Pipelining http://blog.todmeansfox.com/tag/multithread/

12 How to: Connect to an Excel Workbook https://msdn.microsoft.com/en-us/library/cc280527.aspx

13 Excel Source https://msdn.microsoft.com/en-us/library/ms141683.aspx