Collation and International Terminology
Collations in SQL Server provide sorting rules, case, and accent sensitivity properties for your data. Collations that are used with character data types such as char and varchar dictate the code page and corresponding characters that can be represented for that data type. Whether you are installing a new instance of SQL Server, restoring a database backup, or connecting server to client databases, it is important that you understand the locale requirements, sorting order, and case and accent sensitivity of the data you will be working with.
When you select a collation for your server, database, column, or expression, you are assigning certain characteristics to your data that will affect the results of many operations in your database. For example, when you construct a query by using ORDER BY, the sort order of your result set might be dependent on the collation that is applied to the database or dictated in a COLLATE clause at the expression level of the query.
A collation can contain any or all of the following characteristics:
Case sensitivity
Accent sensitivity
Kana sensitivity
Width sensitivity
To best use collation support in SQL Server, you must understand the terms that are defined in this topic, and how they relate to the characteristics of your data.
Collation
A collation specifies the bit patterns that represent each character in a data set. Collations also determine the rules that sort and compare data. SQL Server supports storing objects with different collations in a single database. For non-Unicode columns, the collation setting specifies the code page for the data and which characters can be represented. Data that is moved between non-Unicode columns must be converted from the source code page to the destination code page.
Transact-SQL statement results can vary when the statement is run in the context of different databases that have different collation settings. If it is possible, use a standardized collation for your organization. This way, you do not have to explicitly specify the collation in every character or Unicode expression. If you must work with objects that have different collation and code page settings, code your queries to consider the rules of collation precedence. For more information, see Collation Precedence (Transact-SQL).
The characteristics of a collation are language sensitivity, case sensitivity, accent sensitivity, Kana-sensitive, and width sensitivity.
SQL Server collations include the following collation sets:
Setting collations are supported at the following levels of an instance of SQL Server:
Locale
A locale is a set of information that is associated with a place or a culture. This can include the name and identifier of the spoken language, the script that is used to write the language, and cultural conventions. Collations can be associated with one or more locales.
Unicode
If you store character data that reflects multiple languages, always use Unicode data types (nchar, nvarchar, and ntext) instead of the non-Unicode data types (char, varchar, and text).
Significant limitations are associated with non-Unicode data types. This is because a non-Unicode computer will be limited to use of a single code page. You might experience performance gain by using Unicode because fewer code-page conversions are required. Unicode collations must be selected individually at the database, column or expression level because they are not supported at the server level.
The code pages that a client uses are determined by your operating system settings. To set client code pages on the Windows 2000, Windows XP, Windows Server 2003, or Windows Server 2008 operating systems, use Regional Settings in Control Panel.
When you move data from a server to a client, your server collation might not be recognized by older client drivers. This can happen when you move data from a Unicode server to a non-Unicode client. Your best option might be to upgrade the client operating system so that the underlying system collations are updated. If your client has database client software installed, you might consider applying a service update to the database client software.
You can also try to use a different collation for the data on the server. Choose a collation that will map to a code page on the client. For more information, see the "Setting and Changing Collations" topic in SQL Server Books Online.
To evaluate issues that are related to using Unicode or non-Unicode data types, test your scenario to measure performance differences in your environment. It is a good practice to standardize the collation that is used on systems across your organization, and deploy Unicode servers and clients wherever possible. For more information about Unicode, see the Unicode Consortium Web site.
In many situations, SQL Server will interact with other servers or clients, and your organization might use multiple data access standards between applications and server instances. SQL Server clients are one of two main types:
Unicode clients that use OLE DB and Open Database Connectivity (ODBC) version 3.7 or a later version.
Non-Unicode clients that use DB-Library and ODBC version 3.6 or an earlier version.
The following table provides information about using multilingual data with various combinations of Unicode and non-Unicode servers.
Server | Client | Benefits or Limitations |
|---|---|---|
Unicode | Unicode | Because Unicode data will be used throughout the system, this scenario provides the best performance and protection from corruption of retrieved data. This is the situation with ActiveX Data Objects (ADO), OLE DB, and ODBC version 3.7 or a later version. |
Unicode | Non-Unicode | In this scenario, especially with connections between a server that is running a newer operating system and a client that is running an older version of SQL Server, or on an older operating system, there can be limitations or errors when you move data to a client computer. Unicode data on the server will try to map to a corresponding code page on the non-Unicode client to convert the data. |
Non-Unicode | Unicode | This is not an ideal configuration for using multilingual data. You cannot write Unicode data to the non-Unicode server. Problems are likely to occur when data is sent to servers that are outside the server's code page. |
Non-Unicode | Non-Unicode | This is a very limiting scenario for multilingual data. You can use only a single code page. |
If you experience errors or difficulties when moving data from a server to a client, specifically from a Unicode server to a non-Unicode client, your server collation might not be recognized by older client drivers. In this situation, your best option might be to upgrade the client operating system so that the underlying system collations are updated. If your client has database client software installed, you might consider applying a service update to the database client software.
You can also try to use a different collation for the data on the server. Choose a collation that will map to a code page on the client. For more information about changing collations, see the "Setting and Changing Collations in SQL Server" topic in SQL Server Books Online. For more information about changing collations, see the The Impact of Changing Collations and of Changing Data Types from Non-Unicode to Unicode from the Microsoft Download Center. For more information about migrating non-Unicode datatypes to Unicode, see the SQL Server Best Practices Migration to Unicode whitepaper from the Microsoft Download Center.
Related Topics: Unicode Basics
Code Page
A code page is an ordered set of characters of a given script in which a numeric index, or code point value, is associated with each character. A Windows code page is commonly referred to as a character set or charset. Code pages are used to provide support for the character sets and keyboard layouts that are used by different Windows system locales.
All Windows Server 2008 Unicode collations are Unicode 5.0-based.
Data Types
A data type is a definition that specifies a range of values, the operations that can be performed on the values, and how the values are stored in memory. Defining data types lets SQL Server predictably manipulate data. Non-Unicode character data types are char, varchar, and text. Unicode data types are nchar, nvarchar, and ntext. We recommend that you use Unicode data types in your applications, especially if you store character data that reflects multiple languages.
For more information about migrating non-Unicode datatypes to Unicode, see the SQL Server Best Practices Migration to Unicode whitepaper.
Related Topics: Data Types (Database Engine), Data Types (Transact-SQL), Integration Services Data Types
Sort Order
Sort order specifies how data values are sorted. This affects the results of data comparison. Data is sorted by using collations, and it can be optimized by using indexes.
Related Topics: Windows Collation Sorting Styles, Indexes

Note