Unicode-only Collations

The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.

Unicode uses 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. You can minimize character conversion issues by using Unicode data types throughout your system.

Using COLLATE

Windows Unicode-only collations specified for Unicode-only data, such as nchar, nvarchar, and nvarchar(max), do not have associated code pages.

In SQL Server, these data types support Unicode data:

  • nchar

  • nvarchar

  • ntext

Unicode-only collations can be used with the COLLATE clause to apply collations to the nchar, nvarchar, and ntext data types on column level and expression-level data; however, Unicode-only collations cannot be used with the COLLATE clause to change the collation of a database or server instance.

Unicode-only collations can be useful when managing data between a server installation and client database applications. Legacy client applications are often installed on older operating systems, and the client application may not recognize a newer Windows collation that has been applied to a SQL Server database or server installed on a newer operating system. If a Unicode-only collation is applied to specific column-level or expression-level data on the server, the client will not attempt to map incoming data to an incorrect code page, and data imported to the client will maintain character integrity.