Unicode Basics

Storing data in multiple languages within one database is difficult to manage when you use only character data and code pages. It is also difficult to find one code page for the database that can store all the required language-specific characters. Additionally, it is difficult to guarantee the correct translation of special characters when being read or updated by different clients running various code pages. Databases that support international clients should always use Unicode data types instead of non-Unicode data types.

For example, consider a database of customers in North America that must handle three major languages:

  • Spanish names and addresses for Mexico
  • French names and addresses for Quebec
  • English names and addresses for the rest of Canada and the United States

When you use only character columns and code pages, you must take care to make sure the database is installed with a code page that will handle the characters of all three languages. You must also take care to guarantee the correct translation of characters from one of the languages when read by clients running a code page for another language.

With the growth of the Internet, it is even more important to support many client computers that are running different locales. Selecting a code page for character data types that will support all the characters required by a worldwide audience would be difficult.

The easiest way to manage character data in international databases is to always use the Unicode nchar, nvarchar, and nvarchar(max) data types, instead of their non-Unicode equivalents, char, varchar, and text.

Unicode is a standard for mapping code points to characters. Because it is designed to cover all the characters of all the languages of the world, there is no need for different code pages to handle different sets of characters. SQL Server 2005 supports the Unicode Standard, Version 3.2.

If all the applications that work with international databases also use Unicode variables instead of non-Unicode variables, character translations do not have to be performed anywhere in the system. Clients will see the same characters in the data as all other clients.

SQL Server 2005 stores all textual system catalog data in columns having Unicode data types. The names of database objects, such as tables, views, and stored procedures, are stored in Unicode columns. This enables applications to be developed by using only Unicode, and helps avoid all issues with code page conversions.

See Also

Concepts

Using Unicode Data
Storage and Performance Effects of Unicode

Help and Information

Getting SQL Server 2005 Assistance

Change History

Release History

17 July 2006

Changed content:
  • Replaced reference to ntext data type with nvarchar(max) data type.