Export (0) Print
Expand All
Automate Testing of Your Stored Procs
Streamline Your Database Setup Process with a Custom Installer
Stop SQL Injection Attacks Before They Stop You
Get a Lean, Mean Dev Machine with the Express Editions of Visual Basic and SQL Server 2005
Keep Bad Guys at Bay with the Advanced Security Features in SQL Server 2005
AMO Lets You Dig Deeper into Your Data from Your Own Applications
Make Sense of Your Web Feedback using SQL Server 2005
Fuzzy Lookups and Groupings Provide Powerful Data Cleansing Capabilities
Unearth the New Data Mining Features of Analysis Services 2005
Using XQuery, New Large DataTypes, and More
Expand Minimize

Microsoft Jet 4.0 Sorting APIs: DBCompareStringW and DBLCMapStringW

SQL Server 2005
 

Lee Woods
Microsoft Corporation

March 2007

Applies to:
   Microsoft Jet Database Engine 4.0
   SQL Server
   Microsoft Windows API

Summary: The Microsoft Jet 4.0 relational database engine utilizes character sorting tables when creating and querying field indexes made up of character data. Two Microsoft Windows APIs are generally utilized for this functionality: CompareString and LCMapString. (7 printed pages)

Contents

Introduction
DBCompareStringW Function
DBLCMapStringW Function

Introduction

The Microsoft Jet 4.0 relational database engine utilizes character sorting tables when creating and querying field indexes made up of character (string) data. Two Microsoft Windows APIs are generally utilized for this functionality: CompareString and LCMapString.

If Jet were to utilize these Windows APIs, however, a potential compatibility issue would exist for applications sharing Jet databases from different versions of Windows, or when databases are transported between computers running different versions of Windows. The compatibility issue that may exist because CompareString and LCMapString can yield different results across versions of Microsoft Windows, potentially creating index lookup failures when indexes are based on character (string) data.

To ensure 100 percent character indexing compatibility across all previous, current, and future versions of Microsoft Windows, Jet utilizes its own static versions of these APIs; DBCompareStringW and DBLCMapStringW. The Microsoft Jet Database Engine component MSWSTR10 exposes these APIs, which were created to stabilize the operation of the Windows sorting functions utilized by Jet, by freezing a single version of the Windows sorting tables (the sorts of Windows 2000). This effectively isolates it from any potential changes in sort tables by ensuring consistent and identical sorting results, regardless of the Windows version applications (and thus, Jet) is running on.

Table 1. Windows and Jet equivalent APIs

Windows Jet (MSWSTR10)
CompareStringDBCompareStringW
LCMapStringDBLCMapStringW

The input parameters and return values of each function are identical to their Windows equivalent functions, with the following exceptions: the DBLCMapStringW function only supports a sub-set of the mapping operations that the Windows function has, and the DBCompareStringW function supports an additional flag for doing prefix comparisons.

DBCompareStringW Function

The DBCompareStringW function compares two character strings, using the specified locale.

Syntax

int DBCompareStringW(      

    LCID Locale,
    DWORD dwCmpFlags,
    UNALIGNED WCHAR * lpString1,
    int cchCount1,
    UNALIGNED WCHAR * lpString2,
    int cchCount2);

Parameters

Locale

[in] Specifies the locale used for the comparison. This parameter can be one of the following predefined locale identifiers. This parameter can also be a locale identifier created by the MAKELCID macro.

dwCmpFlags

[in] Indicates what type of transformation is to occur during mapping. Several flags can be combined on a single transformation (though some combinations are illegal). Mapping options include:

Table 2. Mapping options for the DBCompareStringW function

FlagMeaning
NORM_IGNORECASEIf set, this flag will cause the function to ignore any differences in upper/lower case.
NORM_IGNORENONSPACENonspace marks are diacritical accents that are added to certain characters in many languages, either as an integral part of a new glyph or as a separate character that should be combined in some way with a base character. This flag, if set, will cause all nonspace marks to be masked off and ignored, whether they are separate characters or not.
NORM_IGNORESYMBOLSIf set, this flag will cause the function to ignore any punctuation and/or other symbol characters that appear in the string.
NORM_IGNOREKANATYPEIf set, this flag will cause the function to ignore any differences of Hiragana and Katakana form.
NORM_IGNOREWIDTHIf set, this flag will cause the function to ignore any differences of character byte size in the ANSII equivalent of the Unicode character.
SORT_STRINGSORTThis flag can be set to force the apostrophe and the hyphen to be treated as if they are regular symbol characters; see last comment in this section
NORM_PREFIXIf set, this flag requests that the two strings be considered equal if no collation differences are found by the time that lpString1 is used up; that is, lpString1 can be a prefix of lpString2.
NORM_PADSPACEIf set, this flag requests that, when the end of the shorter string is found, the code should continue matching against the longer string by inserting as many space characters as required to compare up to the end of the longer string. This flag will affect the results of comparing two strings that only differ because of control codes on the end of the longer string.

lpString1

[in] Pointer to the first Unicode string to be compared.

cchCount1

[in] Specifies the number of WCHARs in the string pointed to by the lpString1 parameter. The count does not include the null-terminator. If this parameter is any negative value, the string is assumed to be null terminated and the length is calculated automatically. If cchCount1 and cchCount2 are not –1, then the comparison continues for the number of characters specified. It will not terminate if a null-terminator is found within the string when cchCount1 and cchCount2 are not set to –1.

lpString2

[in] Pointer to the second Unicode string to be compared.

cchCount2

[in] Specifies the number of WCHARs in the string pointed to by the lpString2 parameter. The count does not include the null-terminator. If this parameter is any negative value, the string is assumed to be null terminated and the length is calculated automatically.

Return Value

Success:  1 (CSTR_LESS_THAN)—if lpString1 is less than lpString2.

                2 (CSTR_EQUAL)—if lpString1 is equal to lpString2.

                3 (CSTR_GREATER_THAN)—if lpString1 is greater than lpString2.

Failure:   0

Remarks

  • Note that if the return value is 2, the two strings are "equal" in the collation sense, though not necessarily identical (that is, the case might be ignored).
  • If the two strings are of different lengths, they are compared up to the length of the shortest one. If they are equal to that point, then the return value will indicate that the longer string is greater, unless the NORM_PREFIX flag is set and the first string is shorter; in this case, the two strings are considered to be equal.
  • All symbols will sort before any other alphanumeric. The hyphen/minus and apostrophe characters are treated as being symbol character if the SORT_STRINGSORT flag is specified; otherwise, they are ignored as symbols unless no other difference are found in the strings. This is to ensure that words like coop and co-op stay together within a list.
  • For Japanese Kana, using the NORM_IGNORENONSPACE will drop the DW field that contain the Daku-on and Handaku-on markers, and also will ignore the 4th and 5th weight fields in the XW field that contain the Repeat, Cho-on, and Small Kana character markers.
  • If the strings supplied contain Arabic Kashidas, the Kashidas will be ignored during the comparison. Therefore, if the two strings are identical save for Kashidas within the strings, DBCompareStringW will return a value of 2; that is, they are "equal" in the collation sense, though not necessarily identical.

DBLCMapStringW Function

The DBLCMapStringStringW function compares two character strings, using the specified locale. DBLCMapStringW provides sort key generation for database indexing. The sort key strings can be compared against each other by doing a byte by byte compare. In addition, this API provides mappings from one character string to another, performing the specified locale-dependent uppercase or lowercase translation. All other string mappings are not supported by the MSWSTR10.DLL.

Syntax

int DBLCMapStringW(      

    LCID Locale,
    DWORD dwMapFlags,
    UNALIGNED WCHAR * lpSrcStr,
    int cchSrc,
    UNALIGNED WCHAR * lpDestStr,
    int cchDest
);

Parameters

Locale

[in] Specifies the locale used for the comparison. This parameter can be one of the following predefined locale identifiers. This parameter can also be a locale identifier created by the MAKELCID macro.

dwCmpFlags

[in] Indicates what type of transformation is to occur during mapping. Several flags can be combined on a single transformation (though some combinations are illegal). Mapping options include:

Table 3. Mapping options for the DBLCMapStringStringW function

FlagMeaning
LCMAP_LOWERCASEIf set, this flag will convert all characters in the string to their lowercase equivalent characters.
LCMAP_UPPERCASEIf set, this flag will convert all characters in the string to their uppercase equivalent characters.
LCMAP_LINGUISTIC_CASINGValid in combination with LCMAP_LOWERCASE or LCMAP_UPPERCASE only. If set, this flag will cause the casing to be done based on linguistic rules rather than file system rules (default behavior).
LCMAP_SORTKEYIf set, this flag will convert the string into sort key (normalized form).
NORM_IGNORECASEValid in combination with LCMAP_SORTKEY. If set, this flag will cause the function to ignore any differences in upper/lower case.
NORM_IGNORENONSPACEValid in combination with LCMAP_SORTKEY. Nonspace marks are diacritical accents that are added to certain characters in many languages, either as an integral part of a new glyph or as a separate character that should be combined in some way with a base character. This flag, if set, will cause all nonspace marks to be masked off and ignored, whether they are separate characters or not.
NORM_IGNORESYMBOLSValid in combination with LCMAP_SORTKEY. If set, this flag will cause the function to ignore any punctuation and/or other symbol characters that appear in the string.
NORM_IGNOREKANATYPEValid in combination with LCMAP_SORTKEY. If set, this flag will cause the function to ignore any differences of Hiragana and Katakana form.
NORM_IGNOREWIDTHValid in combination with LCMAP_SORTKEY. If set, this flag will cause the function to ignore any differences of character byte size in the ANSII equivalent of the Unicode character.
SORT_STRINGSORTValid in combination with LCMAP_SORTKEY. This flag can be set to force the apostrophe and the hyphen to be treated as if they are regular symbol characters; see last comment in this section

lpSrcStr

[in] Pointer to the supplied string to be mapped.

cchSrc

[in] Character count of the input string buffer. This count can include the NULL terminator or not. If the NULL terminator is included in this count, it will not show up in the sort key value anyway, as a NULL is considered "unsortable" and a NULL is always mapped to a NULL, so it will not greatly affect the mapping behavior. Note that this is a WORD count for wide character strings and a BYTE count for multi-byte strings. If –1, lpSrcStr is assumed to be null-terminated; the length will be calculated automatically, and the return string will also be NULL terminated.

lpDestStr

[in] Pointer to the memory buffer to store the resulting mapped string. If LCMAP_SORTKEY is specified, the resulting string will always be terminated by a NULL byte, regardless of what value is in cchSrc, and the resulting buffer is an LPBYTE with the following format (see sorting section for details):

  • [all Unicode weights]0x01 [all Diacritic weights]0x01 [all Case weights]0x01 [all Special weights]0x00
  • If some of these weights are absent from the sort key string, due to ignore flags, the 0x01 separators and the NULL terminator are still present.

cchDest

[in] The character count of the memory buffer pointed to by lpDestStr, or the count of bytes if LCMAP_SORTKEY is specified. If the NULL terminator is included in cchSrc, then cchDest must also include the NULL terminator in the character count. If cchDest is 0, then the return value of this function is the number of characters required to hold the mapped string. The lpDestStr pointer is not referenced in this case. This is a WORD count for wide character strings and a BYTE count for multi-byte strings.

Return Value

Success: number of characters written to lpDestStr including the terminator.
              (OR the number of bytes if LCMAP_SORTKEY was specified.)

Failure:  0

Remarks

  • The mapped string will be null-terminated if the source string is null-terminated.
  • The lpSrcStr and lpDestStr pointers may not be the same.
  • The output string is not strictly in [WCHAR|CHAR] format, since some of the mapping flavors return byte values (sort key) rather than characters.
  • The Arabic Kashida is ignored. Thus, if you create a sort key for a string containing an Arabic Kashida, there will not be a sort key value for the Kashida.
  • All symbols will sort before any other alphanumeric. The hyphen/minus and apostrophe characters are treated as being symbol characters if the SORT_STRINGSORT flag is specified; otherwise they are ignored as symbols unless no other difference are found in the strings. This is to ensure that words like coop and co-op stay together within a list.
  • For Japanese Kana, using the NORM_IGNORENONSPACE will cause the DW field that contain the Daku-on and Handaku-on markers to be omitted from the output, as will the 4th and 5th weight fields in the XW field that contain the Repeat, Cho-on, and Small Kana character markers.
  • When the LCMAP_SORTKEY flag is used with any Chinese locale, it is preferable to use the NORM_IGNORENONSPACE flag, as well. This will result in a shorter sortkey; however, the end result of the sort with or without the NORM_IGNORENONSPACE flag will be identical. These weights are unnecessary in Chinese, but are necessary in other ideographic languages.

Security Alert  Using these functions incorrectly can compromise the security of your application. Strings that are not compared correctly can produce invalid input. Test strings to make sure they are valid before using them and provide error handlers. For more information, see Security Considerations: International Features.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft