Unicode and ANSI Buffer Size Mismatches

Article
11/12/2009

The buffer overrun caused by Unicode and ANSI buffer size mismatches is somewhat common on Windows platforms. It occurs if you mix up the number of elements with the size in bytes of a Unicode buffer. There are two reasons that this occurrence is widespread: Microsoft Windows NT and later versions support ANSI and Unicode strings, and most Unicode functions deal with buffer sizes in wide characters, not byte sizes.

The most commonly used function that is vulnerable to this kind of bug is MultiByteToWideChar. Consider the following code:

BOOL GetName(char *szName)
{
    WCHAR wszUserName[256];
 
    // Convert ANSI name to Unicode.
    MultiByteToWideChar(CP_ACP, 0, 
                        szName,
                        -1, 
                        wszUserName,   
                        sizeof(wszUserName));
    // Snip
    
}

The problem is the last argument of MultiByteToWideChar. The value passed into this call is sizeof(wszUserName), which is a Unicode string, 256 wide characters. A wide character is two bytes, so sizeof(wszUserName) is actually 512 bytes. Thus, the function thinks the buffer is 512 wide characters in size. Because wszUserName is on the stack, there is a potential exploitable buffer overrun.

Following is the correct way to write this function:

    MultiByteToWideChar(CP_ACP, 0, 
                        szName,
                        -1, 
                        wszUserName,   
                        sizeof(wszUserName) / sizeof(wszUserName[0]));

Share via

Unicode and ANSI Buffer Size Mismatches

Additional resources