Planning an International Move

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

You can install, customize, and maintain a single version of Microsoft® Project 2002 throughout your multinational organization. The plug-in language features of Microsoft Project allow users in foreign locales to continue working in their own languages. Alternatively, you can deploy a localized version of Microsoft Project for each language-speaking area.

Migrating Settings from Previous Localized Versions

If your organization is upgrading from a previous localized version of Microsoft Project to Microsoft Project 2002 with the Multilanguage User Interface (MUI) Pack, you can customize the Microsoft Project Setup program so that users' settings and preferences migrate from the previous localized version to the new installation of Microsoft Project 2002.

Because user settings in the previous localized version of Microsoft Project are designed to work with that language version, the settings cannot migrate across language versions of Microsoft Project. Therefore, if you are deploying Microsoft Project 2002 with the MUI Pack and you want to migrate user settings, you must set the installation language of Microsoft Project 2002 to match the language of users' previous localized version of Microsoft Project. Then, when users run the Microsoft Project Setup program, their settings migrate to Microsoft Project 2002.

Notes

You can also migrate user settings from a previous localized version of Microsoft Project to the matching language version of Microsoft Project 2002.

The MUI Pack for Microsoft Project 2002 is available through Microsoft licensing programs such as Open, Select, and Enterprise Agreement. The MUI Pack can also be purchased through Microsoft Licensing.

If a standard deployment throughout your organization is important and you don't want to deploy multiple settings for the installation language, leave the installation language set to English and disable migration of user settings. In this case, user settings cannot migrate across language versions of Microsoft Project, and settings from previous non-English versions of Microsoft Project are lost.

Localized versions of Microsoft Project 2000 and earlier were based on character encoding standards that varied from one script to another. When users working in one language version of Microsoft Project exchanged documents with a user who worked in another language version of Microsoft Project, text was often garbled because of the difference between character encodings. Microsoft Project 2002 is based on an international character encoding standard — Unicode — that allows users upgrading to Microsoft Project 2002 to more easily share documents across languages.

Multilingual documents can contain text in languages that require different scripts. A single script can be used to represent many languages.

For example, the Latin or Roman script has character shapes — glyphs — for the 26 letters (both uppercase and lowercase) of the English alphabet, as well as accented (extended) characters used to represent sounds in other Western European languages.

The Latin script has glyphs to represent all of the characters in most European languages and a few others. Other European languages, such as Greek or Russian, have characters for which there are no glyphs in the Latin script; these languages have their own scripts.

Some Asian languages use ideographic scripts that have glyphs based on Chinese characters. Other languages, such as Thai and Arabic, use complex scripts, which have glyphs that are composed of several smaller glyphs or glyphs that must be shaped differently, depending on adjacent characters.

A common way to store text is to represent each character by using a single byte. The value of each byte is a numeric index — or code point — in a table of characters; a code point corresponds to a character in the code page. For example, a byte whose code point is the decimal value 65 might represent a capital letter a.

This table of characters is called a code page. A code page contains a maximum of 256 bytes; because each character in the code page is represented by a single byte, a code page can contain as many as 256 characters. One code page with its limit of 256 characters cannot accommodate all languages because some languages use far more than 256 characters. Therefore, different scripts use separate code pages. There is one code page for Greek, another for Cyrillic, and so on.

Single-byte code pages cannot accommodate Asian languages, which commonly use more than 5,000 Chinese-based characters. Double-byte code pages were developed to support these languages.

One drawback of the code page system is that the character represented by a particular code point depends on the specific code page on which the code point resides. If you don't know which code page a code point is from, you cannot determine how to interpret the code point.

For example, unless you know which code page it comes from, the code point 230 might be the Greek lowercase zeta (..), the Cyrillic lowercase zhe (..), or the Western European diphthong (..). All three characters have the same code point (230), but the code point is from three different code pages (1253, 1251, and 1252, respectively).

Introducing a Worldwide Character Set

Unicode was developed to create a universal character set that can accommodate all known scripts. Unicode uses a unique, two-byte encoding for every character; so in contrast to code pages, every character has its own unique code point. For example, the Unicode code point of lowercase zeta (..) is the hexadecimal value 03B6, lowercase zhe (..) is 0436, and the diphthong (..) is 00E6.

Unicode 2.0 defines code points for approximately 40,000 characters. More definitions are being added in Unicode 2.1 and Unicode 3.0. Built-in expansion mechanisms in Unicode allow for more than one million characters to be defined, which is more than sufficient for all known scripts.

In the Microsoft Windows® operating systems, the two systems of storing text — code pages and Unicode — coexist. However, Unicode-based systems are replacing code page-based systems. For example, Microsoft Windows NT®, Microsoft Project 98 and later, and Microsoft Internet Explorer version 4.0 and later are all based on Unicode.[

Taking Advantage of Unicode Support

Microsoft Project 2002 is based on an international character encoding standard — Unicode — that allows users upgrading to Microsoft Project 2002 to more easily share documents across languages. Unicode support in Microsoft Project 2002 also allows users to read international documents created in any previous versions of Microsoft Project.

Microsoft Project 2002 also provides the conversion tables necessary to convert code page-based data to Unicode and back again for interaction with previous versions of applications. Because Microsoft Project 2002 provides fonts to support many languages, users can create multilingual documents with text from multiple scripts.

For example, Unicode support in Microsoft Project 2002 allows users to copy multilingual text from Office 97 documents, paste the text into a Microsoft Project 2002 document — and that text will be displayed correctly. Conversely, multilingual text copied from any Microsoft Project 2002 document can be pasted into a document created in any previous version of Microsoft Project.

In addition to document text, Microsoft Project 2002 supports Unicode in other areas, including document properties, bookmarks, style names, footnotes, and user information. Unicode support in Microsoft Project 2002 also allows you to edit and display multilingual text in dialog boxes. For example, you can search for a file by a Greek author's name in the Open dialog box.

Using Unicode Values in Visual Basic for Applications

The Microsoft Visual Basic® for Applications environment does not support Unicode. Only text supported by the operating system can be used in the Visual Basic Editor or displayed in custom dialog boxes or message boxes.

You can use the ChrW() function to manipulate text outside the code page. The ChrW() function accepts a number that represents the Unicode value of a character and returns that character string.

Using Local Language File Names

Under Windows 98 and Windows Me, Unicode characters in file names are not supported, but they are supported under Windows NT 4.0, Windows 2000, and Windows XP. Under Windows 98 and Windows Me, file names must use characters that exist in the code page of the operating system.

If users in your organization share files between language versions of Windows, they should use ASCII characters (unaccented Latin script) to ensure that the file names can be used in any language version of the operating system.

Microsoft Project 2002 now supports opening and saving files with Unicode file names, by clicking Open (File menu) in Microsoft Project, or by double-clicking the file name in Windows Explorer.

Printing and Displaying Unicode Text

Not all printers can print characters from more than one code page. In particular, printers that have built-in fonts might not have characters for other scripts in those fonts. Also, new characters such as the euro currency symbol might be missing from a particular font.

Although Microsoft Project 2002 contains many workarounds to enable printing on such printers, it is not possible in all cases. If text is not printing correctly, updating the printer driver might fix the problem. If the latest driver does not fix the problem, you can try using the "download soft fonts" or "print TrueType as graphic" settings in the printer driver options. Change to one of these settings, and try printing again.

If the text still does not print correctly, you can create a registry entry that works around the printing problems of most printers; the printing quality, however, might be lowered.

To set the registry so that extended characters are printed correctly

Go to the following registry subkey:

Add a new value entry named NoWideTextPrinting and set its value to 1.

Compressing Files that Contain Unicode Text

Microsoft Project 2002 stores text in a form of Unicode called "UTF-16." Unicode characters are encoded in two bytes (or very rarely, four bytes), rather than what is used in non-Unicode systems (that is, a single byte, or a mixture of one and two bytes in some Asian languages). Generally, Microsoft Project 2002 files containing multilingual text are similar in size to files from previous versions of Microsoft Project. However, Microsoft Project 2002 files may be 30 to 50 percent larger than files created in previous, non-Unicode versions of Microsoft Project.

Copying Multilingual Text

You can use the Clipboard to copy multilingual text from Microsoft Project 2002 to an Office XP application. Text in RTF, HTML, and Unicode formats can successfully be pasted into Office XP applications.

Multilingual Code Page-based Single-byte Text

If users paste single-byte (ANSI) text into a Microsoft Project 2002 file from a code page that is different from the one their operating system uses, they are likely to get unintelligible characters in their text. This problem occurs because Microsoft Project cannot determine which code page to use to interpret the single-byte text.

For example, you might paste text from a non-Unicode text editor that uses fonts to indicate which code page to use. If the text editor supplies only RTF and single-byte text, the font (and code page) information is lost when the text is pasted into an application that does not accept RTF. Instead, the application uses the operating system's code page, which may map some characters' code points to unexpected or nonexistent characters.

Support for Surrogate Character Pairs

Microsoft Project 2002 supports Traditional Chinese, Simplified Chinese, Japanese, and Korean ideographic languages. Original Unicode implementation only captured the most commonly used ideographs of these languages; however, there are enough ideographic characters that they have the potential to overrun the entire set of code point assignments that Unicode provides. Surrogate characters are extensions of the existing Unicode standard which allow you to display a greater range of ideographic characters. Using reserved areas in both the high-end and low-end of the Unicode code page, values from each of the two reserved ranges are combined to represent a single character.

1,024 characters are reserved in the high end of the 16-bit Unicode code page, from U+D800 through U+DBFF. At the low end of the 16-bit Unicode code page, 1,024 characters are reserved, from U+DC00 through U+DFFF. One high-end value is combined with a low-end value to create a surrogate pair, a 32-bit character that maps to a real-world character at display time.

When creating surrogate pairs, keep the following in mind:

You cannot combine two characters from the high-end of the Unicode code page to form a pair. Likewise, you cannot combine two characters from the low-end of the Unicode code page to form a pair.

When creating a pair, the first character must always be from the high-end of the Unicode code page reserved characters, followed by a low-end character.

As long as you do not remove a surrogate character or insert another character between them, the integrity of the data in a surrogate pair is maintained.

Surrogate Pair Support in Windows 2000

Windows 2000 and Windows XP support surrogate pairs, frequently referenced under the name "big character set support." Microsoft Project 2002 running under Windows 2000 or Windows XP handles surrogates by continuing to assume that each Unicode character in the pair is 16-bit. Surrogate pairs are processed in the same method as non-spacing marks.

Surrogates are treated as complex script. Because the two 16-bit characters combine to form a new 32-bit character, they need to be treated as such. Thus, you can no longer assume that one half of the Unicode value maps exactly to one other character in the surrogate pair.

Keep in mind also that, when range checking to detect a surrogate pair, that, as they are currently implemented in Windows 2000, the API functions CharNext() and CharPrev() will not perceive a surrogate pair as a single character.

If a computer is unable to display or support surrogate pairs for Traditional Chinese or Simplified Chinese, you may need to install the support package file, which is available from the Microsoft Download Center.

Planning an International Move

On This Page

Migrating Settings from Previous Localized Versions

Notes

Introducing a Worldwide Character Set

Taking Advantage of Unicode Support

Using Unicode Values in Visual Basic for Applications

Using Local Language File Names

Printing and Displaying Unicode Text

To set the registry so that extended characters are printed correctly

Compressing Files that Contain Unicode Text

Copying Multilingual Text

Multilingual Code Page-based Single-byte Text

Support for Surrogate Character Pairs

Surrogate Pair Support in Windows 2000

See Also

Additional resources

Planning an International Move

On This Page

Migrating Settings from Previous Localized Versions

Notes

Sharing Multilingual Documents

Sharing Documents Across Languages

Introducing a Worldwide Character Set

Taking Advantage of Unicode Support

Using Unicode Values in Visual Basic for Applications

Using Local Language File Names

Printing and Displaying Unicode Text

To set the registry so that extended characters are printed correctly

Compressing Files that Contain Unicode Text

Copying Multilingual Text

Multilingual Code Page-based Single-byte Text

Support for Surrogate Character Pairs

Surrogate Pair Support in Windows 2000

See Also

Additional resources