Copying Data Between Different Collations

In Microsoft SQL Server data can be moved between Unicode columns seamlessly and conversion is unnecessary. Data moved between columns containing non-Unicode character data, however, must be converted from the source code page to the target code page. A code page is an ordered set of characters of a given script in which a numeric index, or code point value, is associated with each character. Code pages support character sets and keyboard layouts used by different Microsoft Windows locales. For a list of the supported code pages, see Code Page Architecture.

By default, character data is converted in the following manner:

Operation

Character data converted to:

Export

OEM code page characters

Import

ANSI/Microsoft Windows code page characters

Conversion between OEM and ANSI code pages causes the loss of extended or double-byte character set (DBCS) characters. To avoid such conversions in SQL Server, you can specify a code page or a collation.

Note

For information about transferring a database from one collation to another collation, see this Microsoft Web site.

In SQL Server 7.0 and later, the physical storage of character strings is controlled by collations. Setting collations is supported at four levels: the server instance, a database, a table column, and an expression. A collation specifies three properties:

  • The code page used to store non-Unicode character data.

  • The sort order to use for Unicode data types (nchar, nvarchar, and ntext). A sort order defines the sequence in which characters are sorted, and the way characters are evaluated in comparison operations.

  • The sort order to use for non-Unicode character data types (char, varchar, and text).

Note

Microsoft recommends that you specify a collation name for each column in a format file.

For an introduction to code pages and collations, see Collation and Unicode Support. For in-depth information, see Working with Collations.

Using a Column-Level Collation

In SQL Server 7.0 and later, you can control the conversion of data by specifying the collation in which the data is stored in the field of the target data file, for a bulk-export, or the column of the target table, for a bulk-import operation. Any required translations between the data file collation and the collations of the table columns in the database are performed internally by the import or export operation.

Note

For SQL Server 7.0 and later, specifying a code page is supported but it is preferable to specify the collation in a format file.

Collations always apply to SQLCHAR columns on both bulk-import and bulk-export operations. Column collation specifications are always ignored for columns that do not have either SQLCHAR or SQLNCHAR specified as their host data type. Collations are used to determine the sort order for both SQLCHAR and SQLNCHAR columns during bulk-import operations for which the columns are referenced in the ORDER hint.

To specify the collation, you must use a format file. Both types of format files support specifying column-level collations. For information on using collations in non-XML format files, see the following section, Specifying Column-Level Collations in Non-XML Format Files. For information on using collations in XML format files, see Understanding XML Format Files.

Specifying Column-Level Collations in Non-XML Format Files

The final column in a non-XML format file (the column in ordinal position 8) contains a collation specification that defines how the data for that column is stored in the data file. The collation-column options are as follows:

Option

Description

name

Specifies the name of the collation used to store the data in the data file. For a list of the SQL collation names, see SQL Server Collation Name (Transact-SQL).

RAW

Specifies that the data is stored in the code page that is specified in a code-page option in the command or the bcp_control BCPFILECP hint. If none of these is specified, the collation of the data file is that of the OEM code page of the client computer.

NoteNote
For more information about the code-page options, see "Using a Code Page," later in this topic. For more information, see about the BCPFILECP hint, see bcp_control.

""

Has the same meaning as RAW.

Examples

The following example shows a non-XML format file with column collations specified for fields 2 and 3. The collation information is displayed in bold.

9.0
4
1       SQLCHAR       0       7       "\t"     1     DepartmentID     ""
2       SQLCHAR       0       100     "\t"     2     Name             SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       100     "\t"     3     GroupName        SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR       0       24      "\r\n"   4     ModifiedDate     ""

Note

The preceding format file is for the HumanResources.Department table in the AdventureWorks2008R2 sample database, which has four columns: DepartmentID, Name, GroupName, and ModifiedDate.

For more information on non-XML format files, see Understanding Non-XML Format Files.

Using a Code Page

Note

To control the conversion of data, Microsoft recommends that you use a format file to specify column-level collations for a bulk-export or bulk-import operation. For more information, see. "Using a Column-level Collation," earlier section.

The syntax for specifying a code page is as follows:

  • To specify a code page in a bcp command, use the -C switch:

    -C { ACP | OEM | RAW | code_page }

  • To specify a code page in a BULK INSERT or INSERT ... SELECT * FROM OPENROWSET(BULK...) statement, use the CODEPAGE option:

    CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' }

Both the -C switch and the CODEPAGE option accept the following values for the code page.

Code Page Value

Description

ACP

During an import operation, specifies that data fields of the char, varchar, or text data type are converted from the ANSI/Windows code page (ISO 1252) to the SQL Server code page.

During an export operation, specifies that bcp converts these columns from the SQL Server code page to the ANSI/Windows code page.

OEM (default)

During an import operation, specifies that data fields of the char, varchar, or text data type are converted from the system OEM code page to the SQL Server code page.

During an export operation, specifies that bcp converts these columns from the SQL Server code page to the system OEM code page.

RAW

No conversion from one code page to another occurs. This is the fastest option.

code_page

A specific code page number (for example, 850). For a list of the supported code pages, see Code Page Architecture.

Examples

The following examples use the HumanResources.myTeam table in the AdventureWorks2008R2 database. Before you can run the examples, you must create this table. For information about the table and how to create it, see Creating the HumanResources.myTeam Table.

Note

Before testing the following example, you should delete the existing rows from the myTeam table to avoid primary-key conflicts.

A. Using a Code Page with bcp

This example uses bcp to bulk export the data from the HumanResources.myTeam table in the AdventureWorks2008R2 database to the myTeam850.txt data file, use the code page 850. At the Windows command prompt, enter:

bcp AdventureWorks2008R2.HumanResources.myTeam out myTeam850.txt -c -C850 -T

B. Using a Code Page with BULK INSERT

This example uses BULK INSERT to bulk import myTeam850.txt, created in the preceding bcp example, into the AdventureWorks2008R2 HumanResources.myTeam table.

From a query tool, such as SQL Server Management Studio Query Editor, execute:

USE AdventureWorks2008R2;
GO
DELETE HumanResources.myTeam;
GO
BULK INSERT HumanResources.myTeam 
   FROM 'C:\myTeam850.txt'
   WITH (CODEPAGE = 850);
GO

Bulk Exporting Data Between Different Collations

In a bulk-export operation, the collation specification controls the code page that is used to store character data in the data file. It applies to:

  • All columns in a character-format data file.

  • Any column in a native mode file where SQLCHAR is specified as the host file data type.

  • SQLCHAR characters whose values are greater than 127 or less than 32.

    Collations are applied to characters whose values are 32 through 127, but all code pages map the same characters to the values from 32 to 127, so applying different collations has no effect.

The rules for determining which collation or code page is used on a bulk export operation are as follows:

  • If a column collation is specified either in a format file or by using the ODBC bcp_setcolfmt function (bcp_setcolfmt), the character data is stored using the ANSI code page associated with the collation.

  • If a column collation was not specified, but a code page was specified by either a code-page option in the command or the bcp_control BCPFILECP hint, all SQLCHAR data from columns having no column collation specification is stored using the specified code page.

    Note

    For information about the code-page options, see "Using a Code Page," earlier in this topic. For more information, see about the BCPFILECP hint, see bcp_control.

  • If you do not specify either a collation or a code page, SQLCHAR data is stored using the OEM code page of the client computer.

Note

No information about the collation/code page is stored in a data file.

Bulk Importing Data Between Different Collations

For a bulk-import operation, code-page interpretation applies to columns stored as character-format, SQLCHAR, data in a data file. In a character-format data file, all columns are stored as SQLCHAR. Because no information about the collation/code page is stored in a data file, for a bulk-import operation, you must supply the information about the collation/code page of the data fields.

The rules for determining which collation or code page is used on a bulk-import operation are as follows:

  • If you specify a column collation in a format file or by using the ODBC bcp_setcolfmt function (bcp_setcolfmt), the SQLCHAR data in a data file is interpreted by using the ANSI code page that is associated with the specified column collation. Ensure that your collation specifications match the collations in the data file.

  • If a column collation is not specified, but a code page is specified by using the code-page option in the command or the bcp_control BCPFILECP hint, the SQLCHAR data is interpreted by using the specified code page.

    Note

    For information about the code-page options, see Using a Code Page, earlier in this topic. For more information, see about the BCPFILECP hint, see bcp_control.

  • If you do not specify any collations or a code page, then data in SQLCHAR columns is interpreted by using the OEM code page of the client computer.

On a bulk-import operation, the collation specification controls the following:

  • How the operation attempts to interpret the code page of SQLCHAR columns in the data file.

  • How bcp or BULK INSERT applies the ORDER hint, if specified.

    If you use the ORDER hint, the bulk-import operation uses collations to properly interpret the ORDER hint. This hint applies to both SQLCHAR and SQLNCHAR columns. The data in the columns referenced by the ORDER hint must be in the sequence that is defined by the collation that is mapped to those columns. For more information, see Controlling the Sort Order When Bulk Importing Data.