Chapter 8 - Developing: Databases - Migrating the Data

Article
12/05/2007

Introduction and Goals

Migration of the data represents the final task in the migration of an Oracle database to Microsoft® SQL Server™. Even though SQL Server offers several tools that make the bulk transfer of data from an external source easier, aspects of the migration that cannot be automated, such as planning the migration and validating that the data has been moved completely and without error, should be prioritized.

The data migration task can be broken up into the following three subtasks:

Planning. It is important to understand the migration options, evaluate the characteristics of the source data, and evaluate environmental and business constraints.
Execution. This subtask involves preparing the database and transferring the data.
Validation. This subtask accounts for all the data and verifies data integrity.

Planning the Data Migration

There are two planning prerequisites. Before making any decisions, you should first fully understand the various options available for transferring the data from an Oracle database to a SQL Server database, particularly the advantages and limitations of each of the options. The second prerequisite is to document all the factors or characteristics of the original environment that could influence your decision making from the available options. These prerequisites are discussed in detail under the following headings.

Options for Migration

In an Oracle to SQL Server migration scenario, the two avenues for migrating data are to either push the data from Oracle to SQL Server, or to pull the data from Oracle into SQL Server.

Even though Oracle can communicate with heterogeneous databases, the operation is only at a transactional level and cannot be taken advantage of for bulk data operations as needed in a migration situation. The only specialized utility Oracle provides for exporting data out of the database is the export utility exp. This utility can only create dumps in binary format that cannot be used for importing into a non-Oracle database.

SQL Server is built for bulk copying of data, with specialized features and interfaces to a large host of data sources. Figure 8.1 shows the various paths that offer bulk copy functions for moving data from Oracle.

Figure 8.1 Paths for accessing an Oracle database from SQL Server

Figure 8.1 Paths for accessing an Oracle database from SQL Server

Data can be migrated from Oracle to SQL Server using one of the following options:

Bulk Copy Program (bcp). bcp is a command line utility that uses the ODBC bulk copy API in SQL Server. bcp cannot connect to an Oracle database and, when employed for moving data from Oracle to SQL Server, the data has to be an ASCII text file. Multiple copies of bcp can be run in parallel while working on the same target table. The number of parallel sessions is limited by the number of CPUs on the server.

For information about using the bcp utility with Microsoft SQL Server, refer to the "bcp Utility" article available at https://msdn.microsoft.com/library/default.asp?url=/library/en-us/coprompt/cp_bcp_61et.asp.
BULK INSERT. BULK INSERT is a T-SQL statement. This statement has the same kind of functionality as bcp and can be used to import Oracle data that has been captured in a text file. This operation scales linearly with the number of CPUs, but it is limited to one thread per CPU.

For information on the complete BULK INSERT T-SQL statement and its usage, refer to https://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ba-bz_4fec.asp.
Data Transformation Service (DTS). DTS is a powerful set of GUI tools that can be used to extract, transform, and populate SQL Server databases from Oracle databases. DTS wizards make the entire process of defining and performing the import very easy. DTS provides access to the BULK INSERT statement in the DTS Bulk Insert task. In addition to working with text files, DTS can use OLE DB and ODBC providers to connect to an Oracle database. The direct connectivity to Oracle provides the capability to work with binary large objects (BLOBs).

For an overview of DTS, refer to https://msdn.microsoft.com/SQL/sqlwarehouse/DTS/default.aspx?pull=/library/en-us/dnsql2k/html/dts_overview.asp.

Even though the functionality to transform data during the migration exists, an assumption made during this discussion is that no transformation of the source data is required. The BULK INSERT T-SQL statement has been documented in SQL Server Books Online as the fastest bulk copy method because it executes with the SQL Server process. In contrast, bcp and DTS are external utilities that are running as their own processes. As a result, they have the overhead of having to communicate with SQL Server (using IPC) and pass it data from the source. When bcp and DTS are run on a client computer, there is network overhead as well.

bcp is faster than DTS when importing very large quantities of data into SQL Server, but it lacks some of the features of DTS. Studies have ranked bcp first in speed, followed by BULK INSERT, and then DTS. Refer to the SQL Server Magazine article "Showdown — bcp vs. DTS" available at https://www.winnetmag.com/Articles/Print.cfm?ArticleID=19760. Other factors must be considered in making a migration strategy decision. These factors are discussed under the following heading.

The Microsoft SQL Server Migration Assistant (SSMA) provides the Schema and Data Migrator tool, which can migrate the schema and also move the data. The data is migrated using the SQL Server Linked Server feature to connect directly to Oracle through the Oracle Gateway. For more information on the Schema and Data Migrator and to download the tool refer to https://www.microsoft.com/sql/migration. The beta version of this tool is available as of the date of publishing this solution. Version 1.0 of the tool is slated to be available in June 2005.

Factors in Migration

Given the various options for migrating data from Oracle to SQL Server, the choice is influenced by several business and environment factors. Some of the important factors (the more critical ones stated first) are:

Volume of data. The amount of data that has to be migrated is the number one factor. High volumes of data require more storage, more processing power, larger network bandwidth, larger migration window, and increased risk of failure.

For more information on how to optimize large data imports in SQL Server, refer to https://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1a_5gyt.asp.
Migration window. The time window available for migration may necessitate multiple parallel sessions, more resources, and staging of data.
Type of data. The existence of BLOBs can be handled only by DTS.
Server processing capacity. Running bcp on the same server as the database reduces network overhead of bcp talking to SQL Server, but it consumes CPU on the server.
Storage availability. The amount of storage available in both the source and target environment influences the choice of migration method and strategy. For example, moving the text files on to the target server reduces network overhead. The configuration of storage also affects the speed of migration, such as putting the source text file and the database files on separate disks or devices.
Data source. The capability to create flat files in the source environment affects the choice of migration method. A fixed-field format source file takes a lot more storage and is considerably slower. Delimited format is recommended.
Type of database. Batch, OLTP, or DSS database type defines the type of schema objects and their characteristics.
Recovery Model. Database recovery models should be set to simple or bulk-logged to minimize logging of BULK INSERT. The capability to do so can be affected if there are databases being shared by other applications that are in use during the migration.
Use proven methods. Only use methods and options that have proven to work by the industry and for the environment. For example, OLE DB has been proven to be faster than ODBC while providing all of the same features.

Migration Strategy

The following are some of the factors that can contribute to the creation of a robust migration strategy:

Minimize logging. The actions that should be taken to minimize logging are:
- Set the database recovery model to bulk-logged or simple.
- Use TABLOCK hint during bulk loading.
- The target table should either be empty or no indexes should be defined on it.
- Disable or drop all triggers on the target table.
- Drop replication on the target table.
Reduce operations overhead. You should drop indexes, constraints, defaults, and identity operations. When a clustered index is planned for a table, order the data while creating the output file from Oracle. The BULK INSERT statement accepts the ORDER clause and the bcp utility accepts the ORDER hint to specify the sort of the data files. Avoid concurrent access by other users to the tables being loaded. The TABLOCK hint can be used while performing BULK INSERT or bcp.
Improve parallelism. With a large amount of data (which is the assumption made throughout this section), the best performance can only be achieved by performing loads in parallel. The parallelism can be inter-table or intra-table.
Avoid side effects. Turn off triggers, bypass identity, bypass default column setting, and drop replication.
Reduce complexity. When the source database has a large number of tables, break it down into logical subsets and stage data, if necessary. Importing static and historical data can be performed in advance.
Reduce risk. Break down larger source data text files using either an editor or by specifying ranges in the SQL queries that are used for exporting the data into spool files. Any failures that occur when dealing with large files waste time and resources. In a migration containing very volatile data, the migration should be completed in a single stage instead of multiple stages.
Commit more resources. Additional CPU, memory, and storage should be acquired for the migration.
Auditing. Include audit steps during the process. This enables trapping of errors as they occur during the transfer of data.

Table 8.1 details the suitable method of data porting for different migration environments.

Table 8.1: Data Migration Options

Scenario	Recommended method
Data migration environments that have high performance requirements and have huge volumes of data to be transferred	bcp or BULK INSERT
Methods of data migration that use a small amount of scripts	bcp
Applications that need to use scripts on the data transfer process	BULK INSERT
Applications that require high-powered transformation of moderate volumes of data	DTS
Tables contain large objects (LOB) such as RAW, LONG, BLOB, and CLOB	DTS

Here are some additional considerations for planning the migration:

The final strategy should be based on the amount of time available for the actual deployment and cutover.
The performance of the load operation should be dependent on the chosen strategy.
Instead of sticking to a single method, a combination of methods can be employed based on the strengths and weaknesses of the individual methods. For example, DTS need not be used as the migration method just because there are LOB columns.
The tables in the migration can be broken up and a combination of methods could be used, taking advantage of the strengths of each.
Parallel operation should be applied on large tables.
The performance of bcp versus BULK INSERT has been found to vary. Test both these approaches in your scenario before making a decision.
Test data files should be created and the performance of the various methods measured in the target environment.
A prototype of the final strategy should be created and the process should be practiced and perfected before deployment.

Executing the Data Migration

This section details the steps to be performed for migrating data. The section has been broken into three tasks:

Pre-implementation
Implementation
Post-implementation

Each of these tasks is discussed in detail under the following headings.

Pre-Implementation Tasks

The following tasks have to be completed before performing the migration:

Back up the database.

Delete any test data from the target database and create a full backup of the empty database. Restoring the backup will be faster than the work involved in rolling back a failed or incomplete migration.
Export Oracle data into text files.

Data from each table that has to be migrated is exported into comma-delimited ASCII text files.

Note The string data should be optionally enclosed in double quotation marks (") in case there are commas in the column data. The date fields can be exported in the desired format using the TO_CHAR function.

Sample:

In SQL*Plus, connect to the schema and execute the following command:
```
desc countries
```
The output is:
```
Name                                Null?    Type        
```

COUNTRY_ID NOT NULL CHAR(2) COUNTRY_NAME VARCHAR2(40) REGION_ID NUMBER

Create a .sql file (example countries.sql) with the following content:

<pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">set pages 0

set lines 10000 set trimspool on set feedback off spool /data/dump/countries.csv select country_id||','||country_name||','||region_id from countries; spool off

Executing this file using SQL\*Plus produces the following text file that can be imported into SQL Server.

**Note** If these commands are typed at the command prompt inside the SQL\*Plus utility in Microsoft Windows®, the output file will contain extraneous prompt commands at the beginning and end of the file. This does not occur if the commands are executed from a file.

<pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">AR,Argentina,2

AU,Australia,3 BE,Belgium,1 BR,Brazil,2 CA,Canada,2 CH,Switzerland,1 CN,China,3 DE,Germany,1 DK,Denmark,1 EG,Egypt,4 FR,France,1 HK,HongKong,3 IL,Israel,4 IN,India,3 IT,Italy,1 JP,Japan,3 KW,Kuwait,4 MX,Mexico,2 NG,Nigeria,4 NL,Netherlands,1 SG,Singapore,3 UK,United Kingdom,1 US,United States of America,2 ZM,Zambia,4 ZW,Zimbabwe,4

**Note** Using SQL\*Plus and the **spool** command is the best way to create comma-separated (CSV) exports of Oracle table data. Other solutions available based on the functionality currently available in Oracle are DBMS\_OUTPUT, UTL\_FILE, and Pro\*C. For details on the various methods, refer to <https://asktom.oracle.com/~tkyte/flat/index.html>  
However, these options should be employed only if the data is being manipulated during the output process because there are performance overheads in their use.

Set the recovery model of the SQL Server database.

Change the recovery model of the target SQL Server database to bulk-logged or simple. Bulk-logged mode is preferred.

The following command sets the recovery model of a database to bulk-logged:
```
ALTER DATABASE database_name
```

SET RECOVERY BULK_LOGGED

where

<pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">database_name</pre>

is the name of the database whose recovery model has to be set.

Disable all constraints.

Identify and disable all constraints, including CHECK, PRIMARY KEY, FOREIGN KEY, and UNIQUE. Create scripts to restore them.

In SQL Server, only FOREIGN KEY and CHECK constraints can be disabled.

The following command can be used to disable FOREIGN KEY and CHECK constraints of a SQL Server table:
```
ALTER TABLE table_name NOCHECK CONSTRAINT ALL
```
where
```
table_name
```

is the name of the table whose constraints have to be disabled.

PRIMARY KEY and UNIQUE constraints have to be dropped.

The following command can be used to drop PRIMARY and UNIQUE by specifying their name:

<pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">ALTER TABLE table_name DROP CONSTRAINT constraint_name</pre>


where

<pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">constraint_name</pre>

is the name of the constraint and FakePre-788d2f6d984a401ba8fcb80d546d0a86-513ea020166e4584add3a7c1f35a2775 is the name of the table in which the constraint exists.

**Note** By default, constraints are not checked when using **bcp**.

Disable all triggers.

Disable triggers on all tables into which data will be loaded.

The following command can be used to disable all triggers on a table.
```
ALTER TABLE table_name DISABLE TRIGGER ALL
```
where
```
table_name
```

is the name of the table on which the triggers have to be disabled.

Drop all indexes.

Drop all remaining indexes using the following command after saving the script to recreate them.
```
DROP INDEX table_name.index_name
```
where
```
index_name
```

is the name of the index to be dropped and FakePre-0ce2e763d553414a80e2336a84789737-260670535f114979b80f9ee3ebdad0d3 is the table on which the index exists.

Handle IDENTITY column inserts.

Let explicit values be copied into columns of a table with the Identity property.
```
SET IDENTITY_INSERT table_name ON
```
This setting can be used only on a single table at a time. So ensure that this is set before importing a table, or use the functionality in the method being used. In bcp, the same thing can be achieved using the -E switch. In BULK INSERT, use the KEEPIDENTITY argument.
```
BULK INSERT 'table_name' FROM 'data file' WITH KEEPIDENTITY
```
Back up the database.

Create a full backup after the data migration before performing any validation or testing.

Implementation Tasks

The implementation is the actual transfer of data from the source Oracle database to the target SQL Server database. The implementation can be performed in one step or in two steps. In the single step, implementation data is streamed directly from the Oracle database to the SQL Server database. In the two step process, data is extracted from the Oracle database into files and then loaded into the SQL Server database.

SQL Server provides the following three tools for loading data into a database:

The bulk copy program (bcp)
The BULK INSERT statement
Data Transformation Services (DTS)

The use of these three tools in transferring data into SQL Server is discussed under the following headings.

Using bcp

The bcp utility can be accessed from the command prompt. The syntax for use is provided here:

bcp database_name.owner.table_name in data_file –t field_terminator –S server_name\instance_name –U username –P password –c –t ,

where

database_name.owner.table_name

is the fully qualified name of the table into which the data is to be loaded, FakePre-eabf674f039447d98efb2d141abbe4b3-99225c94e9c440098b199f02aa5c2aef is the name of the text file containing the data, FakePre-d87dd35a7ec1489c8aa20af971b5ca30-0918f4d1be2b467f8810131c0ca15ed3 is the character that is used to delimit columns in a row, FakePre-603c0007020e40b1aa6a37268f047629-28c1e7fdbef84ec5ae7901d47de2d8d8 is the SQL Server instance, FakePre-866072b3c65547d8a409df717258a387-db79791c16d64a5ca52c863f9ef02516 is the login name, and FakePre-9c9199f656314b429616b15547da7509-df4140098bd3483e9d2f503ae151adfe is the password for the user login.

-c specifies that the data in the file is in the character data type

-t specifies the field terminator

Figure 8.2 shows the results of executing this command at the Windows command prompt.

Figure 8.2 Importing data from a text file using bcp

To make the bcp process repeatable, a format file should be created. The format file can be created interactively by executing the bcp command, as shown in Figure 8.3.

Figure 8.3 Creating a format file for use with bcp imports

The format file looks like this:

8.0
3
SQLCHAR     0   2   ","     1    COUNTRY_ID     SQL_Latin1_General_CP1_CI_AS
SQLCHAR     0   40  ","     2    COUNTRY_NAME   SQL_Latin1_General_CP1_CI_AS
SQLNUMERIC  0   19  "\r\n"  3    REGION_ID      ""

Refer to the topic "Using Format Files" in SQL Server Books Online for more information about format files.

Note The format file will have to be modified to handle text files that contain column data values and which are enclosed by quotation marks (").

The format file can be used for future imports into the same table by using the -f switch. This is illustrated in Figure 8.4. In the example, the format file is assumed to have been copied to C:.

Figure 8.4 Importing data from a text file using bcp format file

Table 8.2 contains the additional arguments that are relevant while importing large volumes of data from Oracle.

Table 8.2: Additional Arguments of the bcp Utility

Argument (or option)	Description
-k	bypasses DEFAULT definition
-h "hints"	bulk copy hints
-E	allows Identity insert
-a	network packets size 512 to 65536 (default 4096)
-e error_file	logs errors during the import
-b batch_size	uses default, which is all rows in the file

Note Hidden characters in text files can cause problems such as the "unexpected null found" error. Care has to be taken when opening and manipulating text files containing data.

Using BULK INSERT

The BULK INSERT T-SQL statement provides the same functionality as the bcp utility. The major difference is that this functionality is from within a database session.

The following statement achieves the same work as was demonstrated with bcp:

BULK INSERT hrdata.hr.countries FROM 'c:\countries.csv'
WITH (
   DATAFILETYPE = 'char',
   FIELDTERMINATOR = ',',
   ROWTERMINATOR = '\n'
)

Note Only members of the sysadmin role can use BULK INSERT.

Table 8.3 contains additional arguments that need to be used while importing large amounts of Oracle data.

Table 8.3: Additional Arguments Available with the BULK INSERT Command

Argument (or option)	Description
-k	bypasses DEFAULT definition
TABLOCK	locks
KEEPIDENTITY	allows values to be specified for identity column
FIRE_TRIGGERS	trigger execution is not suppressed
CHECK_CONSTRAINTS	constraints on table have to be checked
BATCHSIZE	use default, which is all rows in the file

Note For more information about the bulk copy operations and on improving its performance, refer to "Optimizing Bulk Copy Performance" at https://msdn.microsoft.com/library/en-us/optimsql/odp_tun_1a_5gyt.asp?frame=true, and "Bulk Copy Performance Considerations" at https://msdn.microsoft.com/library/en-us/adminsql/ad_impt_bcp_5zqd.asp.
For information on different switches available with Bulk Insert, refer to https://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ba-bz_4fec.asp.

Using DTS

This section is used to demonstrate how DTS wizards can be used to import data into SQL Server tables from text files and directly from an Oracle database.

The DTS Import/Export Wizard can be launched from Enterprise Manager using the following steps or using the DTS Import Wizard:

Expand Server.
Right-click Databases, click All Tasks, and then click Import Data.
Click the Next button on the wizard's Welcome screen.
The next window presented is for choosing the data source. In the Data Source drop-down list choose Text File. Pick the file containing Oracle table data in the File name text box.
The file format characteristics are chosen in the next window. The file type should be ANSI, and the row delimiter should be {CR}{LF}. If column data values are quoted, it can be specified in the Text qualifier field. The delimiter character, which is a comma (,), is specified in the next screen.
The Microsoft OLE DB Provider for SQL Server should be chosen as the destination and the server and login information provided for the target SQL Server instance.

DTS offers the option of executing the import, saving it to a DTS package and scheduling. For information on how to schedule DTS packages, refer to https://msdn.microsoft.com/library/default.asp?url=/library/en-us/dtssql/dts_pkgmng_4hm6.asp.

DTS from Oracle Database

The steps in creating a DTS package to connect and import data directly from the Oracle database is very similar to the process described in the previous section. Instead of pointing the source to a file, an Oracle database is specified through an OLE DB or ODBC provider for Oracle.

The DTS Import/Export Wizard can be launched from Enterprise Manager using the following steps or using the DTS Import Wizard:

Expand Server.
Right-click Databases, click All Tasks, and then click Import Data.
Click the Next button on the wizard's Welcome screen.
The next window presented is for choosing the data source. In the Data Source drop-down list, choose the Microsoft ODBC Driver for Oracle. Enter the Oracle database alias and login information.
The Microsoft OLE DB Provider for SL Server should be chosen as the destination and the server and login information provided for the target SQL Server instance.
The next screen offers the option of copying data from a table (or view) or using a query.
If the table or view option is chosen, the next screen provides a list of tables and views available for import. The destination table name and transformation information has to be provided along with the source table/view.
If query option is chosen, the next screen provides a window for entering the query statement. The destination table and transformation can be entered in the next screen.
The execution of the import can be done immediately or scheduled for a later time.

Note DTS has the capability of reading the table description from the Oracle database and creating tables in SQL Server. The wizard has been found to be faulty in its choice of equivalent data types for SQL Server tables.

Data can be migrated directly out of the database instead of through an intermediary file if the interconnect between the servers hosting the two databases is very fast. Also, no additional storage is required for the data files on the two servers. Use of files can reduce the deployment time if the data export can be done in advance and staged for import.

For a case study on incrementally loading tables using the SQL Server bulk load options, refer to https://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx.

Post-Implementation Tasks

The following tasks have to be performed to ensure the success of the data transfer. The tasks performed here undo the changes that were made to the schema for affecting the data transfer, such as disabling of constraints and triggers. Chapter 9, "Developing: Databases — Unit Testing the Migration" has a more detailed discussion of performing the first two tasks as a part of testing the migrated database.

Validate the data migration.

The validation tasks are discussed in detail in the "Validating the Data Migration" section later in this chapter.
Re-enable the constraints.

Enable all constraints, including CHECK, PRIMARY KEY, FOREIGN KEY, and UNIQUE. Use the scripts that were created while disabling the constraints.

FOREIGN KEY and CHECK constraints can be enabled using the following syntax:
```
ALTER TABLE table_name CHECK CONSTRAINT ALL
```
where
```
table_name
```

is the table on which the CHECK constraints have to be enabled.

PRIMARY KEY and UNIQUE constraints will have to be recreated using the original DDL. Oracle allows multiple rows where all columns (nullable) of a UNIQUE constraint have NULL values. SQL Server treats NULL as duplicates (instead of an unknown value) for the purpose of uniqueness. Hence creating a UNIQUE constraint would return an error if such a condition exists. Because it would be difficult to create values, the best way to maintain the status quo would be to replace the UNIQUE constraint with a trigger that accomplishes the same thing.

Recreate the indexes.

The constraint indexes can be recreated using the saved scripts. Multiple indexes on a single table can be created in parallel using different client sessions.
Enable the triggers.

Identify and enable all triggers by using the following syntax:
```
ALTER TABLE table_name ENABLE TRIGGER ALL
```
where
```
table_name
```

is the table on which the triggers are to be enabled.

Set the recovery model.

Restore the recovery model of the SQL Server database to the original setting (normally full) by using the following syntax:
```
ALTER DATABASE database_name
```

SET RECOVERY FULL

Capture data statistics.

Perform the steps necessary to capture data statistics required by the optimizer.
Back up the database.

Create a full backup of the entire server, including the user databases. This is the first valid backup of the migrated database.

Validating the Data Migration

The following major tasks have to be performed to validate the migrated data:

Verify the data transfer.

The logs or audit files for the data transfers should be checked for errors or failures. After this process is complete, row counts of every table in the destination SQL Server database should be matched to the source Oracle database. If any discrepancy is found in the counts, troubleshooting should include identifying the missing records and viewing the logs and audit files for error messages that can provide reasons for the failure. The data transfer process should take into consideration any changes in the logical model between the source Oracle database and SQL Server. A more stringent check is difficult for large databases. More validation of the data transfer occurs in the next step as part of the data integrity checks.
Validate the data integrity.

Integrity is automatically checked when creating or enabling constraints. The WITH CHECK clause has to be used when adding constraints. Data integrity checking is threatened when migrating databases which lack proper primary key constraints and foreign key constraints. For more information on the functionality available with check constraints, refer to https://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_aa-az_3ied.asp.

In many cases, there are no strict equivalents of Oracle datatypes in SQL Server. Hence the integrity of such data has to be tested to ensure that there is no rounding or truncation, and that the exact value is stored.