Using Unicode Encoding 

Applications that target the common language runtime use encoding to map character representations from the native character scheme (Unicode) to other schemes. Applications use decoding to map characters from nonnative schemes (non-Unicode) to the native scheme. The System.Text namespace provides classes that allow you to encode and decode characters. System.Text encoding support includes the following encodings:

  • Unicode UTF-32 encoding
    Unicode UTF-32 encoding represents Unicode characters as sequences of 32-bit integers. You can use the UTF32Encoding class to convert characters to and from UTF-32 encoding.
  • Unicode UTF-16 encoding
    Unicode UTF-16 encoding represents Unicode characters as sequences of 16-bit integers. You can use the UnicodeEncoding class to convert characters to and from UTF-16 encoding.
  • Unicode UTF-8 encoding
    Unicode UTF-8 encoding represents Unicode characters as sequences of 8-bit bytes. You can use the UTF8Encoding class to convert characters to and from UTF-8 encoding.
  • Unicode UTF-7 encoding
    Unicode UTF-7 encoding represents Unicode characters as sequences of 7-bit bytes ASCII characters. Non-ASCII Unicode characters are represented by an escape sequence of ASCII characters.

    The UTF-7 encoding exists in support of certain protocols for which it is required; generally, these are e-mail or newsgroup protocols. However, UTF-7 is not particularly secure or robust. In some situations, changing one bit can radically alter the interpretation of an entire UTF-7 string. In other situations, quite substantially different UTF-7 strings can encode the same text. Furthermore, for sequences that include non-ASCII characters UTF-7 is much less space-efficient than UTF-8, and encoding/decoding is slower. Consequently, UTF-7 should generally not be used where there is a choice in the matter: UTF-8 should normally be preferred to UTF-7.

    You can use the UTF7Encoding class to convert characters to and from UTF-7 encoding.

  • ASCII encoding
    ASCII encoding encodes the Latin alphabet as single 7-bit ASCII characters. Because this encoding only supports character values from U+0000 through U+007F, in most cases it is inadequate for internationalized applications. You can use the ASCIIEncoding class to convert characters to and from ASCII encoding. For examples of using the ASCIIEncoding class in code, see Encoding Base Types.
  • ANSI/ISO Encodings
    The System.Text.Encoding class provides support for a wide range of ANSI/ISO encodings.

Using the Encoding Class

You can use the Encoding.GetEncoding method to return an encoding object for a specified encoding. You can use the Encoding.GetBytes method to convert a Unicode string to its byte representation in a specified encoding.

The following code example uses the Encoding.GetEncoding method to create a target encoding object for a specified code page. The Encoding.GetBytes method is called on the target encoding object to convert a Unicode string to its byte representation in the target encoding. The byte representations of the strings in the specified code pages are displayed.

Imports System
Imports System.IO
Imports System.Globalization
Imports System.Text

Public Class Encoding_UnicodeToCP
   Public Shared Sub Main()
      ' Converts ASCII characters to bytes.
      ' Displays the string's byte representation in the 
      ' specified code page.
      ' Code page 1252 represents Latin characters.
      PrintCPBytes("Hello, World!", 1252)
      ' Code page 932 represents Japanese characters.
      PrintCPBytes("Hello, World!", 932)
      
      ' Converts Japanese characters.
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",1252)
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",932)
   End Sub

   Public Shared Sub PrintCPBytes(str As String, codePage As Integer)
      Dim targetEncoding As Encoding
      Dim encodedChars() As Byte      
      
      ' Gets the encoding for the specified code page.
      targetEncoding = Encoding.GetEncoding(codePage)
      
      ' Gets the byte representation of the specified string.
      encodedChars = targetEncoding.GetBytes(str)
      
      ' Prints the bytes.
      Console.WriteLine("Byte representation of '{0}' in CP '{1}':", _
         str, codePage)
      Dim i As Integer
      For i = 0 To encodedChars.Length - 1
         Console.WriteLine("Byte {0}: {1}", i, encodedChars(i))
      Next i
   End Sub
End Class
using System;
using System.IO;
using System.Globalization;
using System.Text;

public class Encoding_UnicodeToCP
{
   public static void Main()
   {
      // Converts ASCII characters to bytes.
      // Displays the string's byte representation in the 
      // specified code page.
      // Code page 1252 represents Latin characters.
      PrintCPBytes("Hello, World!",1252);
      // Code page 932 represents Japanese characters.
      PrintCPBytes("Hello, World!",932);

      // Converts Japanese characters to bytes.
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",1252);
      PrintCPBytes("\u307b,\u308b,\u305a,\u3042,\u306d",932);
   }

   public static void PrintCPBytes(string str, int codePage)
   {
      Encoding targetEncoding;
      byte[] encodedChars;

      // Gets the encoding for the specified code page.
      targetEncoding = Encoding.GetEncoding(codePage);

      // Gets the byte representation of the specified string.
      encodedChars = targetEncoding.GetBytes(str);

      // Prints the bytes.
      Console.WriteLine
               ("Byte representation of '{0}' in Code Page  '{1}':", str, 
                  codePage);
      for (int i = 0; i < encodedChars.Length; i++)
               Console.WriteLine("Byte {0}: {1}", i, encodedChars[i]);
   }
}
NoteNote

If you execute this code in a console application, the specified Unicode text elements might not be displayed correctly because the support for Unicode characters in the console environment varies depending on the version of the Windows operating system that is running.

You can use these methods in an ASP.NET application to determine the encoding to use for response characters. Set the value of the HttpResponse.ContentEncoding property to the value returned by the appropriate method. The following code example illustrates how to set HttpResponse.ContentEncoding.

' Explicitly sets ContentEncoding to UTF-8.
Response.ContentEncoding = Encoding.UTF8

' Sets ContentEncoding using the name of an encoding.
Response.ContentEncoding = Encoding.GetEncoding(name)

' Sets ContentEncoding using a code page number.
Response.ContentEncoding = Encoding.GetEncoding(codepageNumber)
// Explicitly sets the encoding to UTF-8.
Response.ContentEncoding = Encoding.UTF8;

// Sets ContentEncoding using the name of an encoding.
Response.ContentEncoding = Encoding.GetEncoding(name);

// Sets ContentEncoding using a code page number.
Response.ContentEncoding = Encoding.GetEncoding(codepageNumber);

For most ASP.NET applications, you should match the HttpResponse.ContentEncoding property to the HttpRequest.ContentEncoding property in order to display text in the encoding that the client expects.

For more information about using encodings in ASP.NET, see the Multiple Encodings Sample in the Common Tasks QuickStart and the Setting Culture and Encoding Sample in the ASP.NET QuickStart.

See Also

Reference

System.Text Namespace

Concepts

Unicode in the .NET Framework

Other Resources

Encoding and Localization