White Paper: Message Formatting

Article
07/22/2014

Preetam Pawar, Technical Lead, IN-BizAppsSvr

May 2011

Summary

This white paper discusses how to structure an e-mail message so that a client program can view the message exactly as it was sent from the source program.

Applies to

Microsoft Exchange Server 2010

Microsoft Exchange Server 2007

Microsoft Exchange Server 2003

Contents

-
Introduction

-
Structure of a message

-
Message Content

-
Message Header

-
Header Folding

-
Message Body

-
Originator fields

-
MIME

-
MIME Headers

-
MIME-Version

-
Content-Type

-
Message

-
Multipart

-
Discrete media types

-
Text

-
Content-Transfer-Encoding

-
Character Encoding (charset)

-
Additional Resources

Introduction

When an e-mail message is sent by using any messaging system, the message contains information and structure that enables the client to display the message as it was sent.

This document discusses the headers and the structure of these messages. These structures and formats are defined as MIME types, and they are discussed in various Request for Comments (RFC) documents, such as RFC 2821, RFC 2822, RFC 2045, RFC 2046, RFC 2047, and RFC 2049.

The examples in this white paper use .eml files because this file type gives the most complete example of a message when it is accessed from its source. To access the message source of an e-mail message, click File, click Properties, click Details, and then click Message Source.

The following figure shows how to access the message source of an e-mail message.

Figure Access the Message Source of an .eml file

Access the Message Source of an eml file

The following figure shows the message source of the e-mail message in the previous figure.

Figure Message Source of the .eml file

Message Source of the eml file

Structure of a message

To successfully structure a message, you must first understand the message format, which consists of the envelope and the content. These message components can be described as follows.

Message Envelope: The message envelope consists of information about the transmission and delivery of the message. This information is generated by the transmission process and is not a part of the message. The message envelope is created by the client who submits the message, and contains information relevant for successful transmission of the message. The message envelope is defined in RFC2821.

For more information about SMTP specifications, see RFC 2821.

Message content: The message content is the part of the e-mail message that is delivered to the recipient. This portion has two elements as defined in RFC2822: the message header and the message body. The e-mail client program uses this information to display the message.

Message Header: The message header is a collection of header fields.

Message Body: The message body is a collection of lines of US-ASCII text that follow the message headers.

The following figure illustrates the parts of a simple Internet message.

Figure A Simple Internet Message

Simple Internet Message

For more information about simple Internet messages, see the MSDN article, About Simple Internet (RFC 822) Messages

Message Content

In this section, we will examine the message content and RFC2822 in more detail.

The RFC2822 standard puts two limits on the number of characters in a line. Each line must be no more than 998 characters long, and should be no more than 78 characters long, excluding the carriage return/line feed (CRLF) character combination. This is known as the “998/78 character limitation.”

The 998 character limit is enforced because many programs that send, receive, or store Internet Message Format messages cannot support more than 998 characters on a line.

The 78 character recommendation is to support the user interfaces of programs that may truncate or wrap more than 78 characters per line when they display a message.

Message Header

The message header consists of a field name, followed by a colon (:) character, followed by a field body, and ended by a CRLF.

The field name and the field body are composed of printable US-ASCII text characters, except for the colon (:) character. US-ASCII characters that have values from 33 through 57 and 59 through 126 are permitted for Header field name.

A Header field can be composed of any US-ASCII characters except CR and LF.

For more information, see the “Understanding the Structure of E-mail Messages” section of the topic Understanding Content Conversion

To view the CRLF in a message header, use Notepad2 or Notepad++. The following figure shows a message header.

Figure Example of a header field

Example of a header field

Note

To obtain Notepad++, visit the following Notepad++ Web site: Notepad++.
To obtain Notepad2, visit the following Notepad2 Web site: Notepad2.

Header Folding

Header folding is used to split a field body into many lines by using a carriage return (CR) and a line feed (LF). The reason for this is to deal with 998/78 character limitation per line.

A field body can contain both a CRLF when it is used in Header folding.

Generally, wherever the standard allows the folding of white space instead of just white space characters, a CRLF can be inserted before the white space.

For example, consider the following header field:

Subject: This is a test

This can be represented as follows:

Subject: This

is a test

For more information about header folding, view section 2.2.3 of RFC 2822. (For information about header folding syntax, see sections 3 and 4.)

Message Body

The message body is a collection of lines of US-ASCII text characters that appear after the message header. The message header and the message body are separated by a blank line that ends with a CRLF. The message body is optional. All lines of text in the message body must be less than 998 characters. The CR and LF characters can appear together only to indicate the end of a line.

The following figure illustrates the header and body areas of a message.

Figure A message that has headers and body

A message with headers and body

Originator Fields

The message must include a From field and a Sender field and can include a Reply-to field. If more than one From field is included, a Sender field must also be present. The From field is the author of the message. The Sender field is the mailbox that was used to send the message. If the author and the sender of the message are the same, the Sender field should not be used.

In Microsoft Exchange Server, the Sender and the From fields are seen in envelope journal messages and in Send on Behalf of messages.

For example, consider the fields in the following message:

From: <author@domain.com>

Subject: Testing Sender-field

Sender: <sender@domain.com>

To: <xyz@domain.com>

CC: <abc@domain.com >

The following example is from journaling:

MIME-Version: 1.0

From: author <author@domain.local>

Sender: <MicrosoftExchange329e71ec88ae4615bbc36ab6ce41109e@domain.local>

To: ukp <ukp@domain.local>

Subject: E2k3 to E2K7

Message-ID: <81dee15f-8a91-4660-bdb5-b16a8547f067@journal.report.generator>

Date: Fri, 5 Feb 2010 06:28:49 +0530

Content-Transfer-Encoding: binary

X-MS-Journal-Report:

Return-Path: <>

X-OriginalArrivalTime: 05 Feb 2010 00:58:49.0482 (UTC) FILETIME=[609F76A0:01CAA5FE]

MIME

In today’s world, messages are not sent only in US-ASCII text. MIME defines a message format that enables the following content:

Textual message bodies in character sets other than US-ASCII
Non-textual message bodies
Multipart message bodies
Textual header information in character sets other than US-ASCII

The following figure illustrates a simple MIME message.

Figure A Simple MIME Message

A Simple MIME Message

For more information about this simple MIME message example, see the following MSDN topic: Sample MIME Message

This example shows the use of a MIME message to send a text message and an attached text file. Both are body parts of this message.

The MIME-Version header informs the receiving client to treat this as a MIME message.

Because this is a multipart content type, a boundary is present. The boundary tells the receiving client that the message has many parts and is separated by a string that is defined in boundary=. A MIME-compliant client will only display or process content that follows the specified boundary= text strings. Boundaries are constructed by using the boundary= string, prepended by a double hyphen (--). The final body part is followed by the boundary= string with the double hyphen (--) both prepended and appended.

The following list explains the use of boundaries as they relate to the previous figure.

Defining a boundary:

Content-Type: multipart/mixed;boundary="XXXXboundary text"
Defining a body part:

--XXXXboundary text

Body part of the message:

--XXXXboundary text

Ending a boundary:

--XXXXboundary text--

A MIME-aware client does not display the “This is a multipart message in MIME format” message because it is outside the boundary.

MIME Headers

MIME headers appear at the start of a MIME message and also in the separate body parts. Some of them can be used both as message headers and in MIME body parts. Some headers are defined for use only in body parts.

The following headers are defined in MIME:

MIME-Version
Content-Type
Content-Transfer-Encoding
Content-ID
Content-Disposition

Note

Headers that begin with "Content-" are the only headers that have defined meaning in body parts.

MIME-Version

With one exception, MIME-Version is the first header that is present on the message. The exception is in message-rfc822, which also has a MIME-Version header for the encapsulated message. MIME-aware e-mail clients use the MIME-Version header field to identify a MIME-encoded message. When this header field is absent, MIME-aware e-mail clients identify the message as plain text.

Currently, "MIME-Version: 1.0" is the only accepted value.

Content-Type

The Content-Type header gives power to the MIME encoded message. Headers are used to specify the media type and subtype of data in the body of a message. This header field identifies the media type of the message content as described in RFC 2046. A media type consists of a type, a subtype, and one or more optional parameters, such as a charset= parameter that defines the MIME character encoding. A media type can also have values such as X-Something or x-something, although these are not standard. Media types begin with “vnd” if they are vendor-specific. The following are examples of various Content-Type headers:

Standard

multipart/mixed

multipart/alternative

X-something

application/x-dvi: Digital Video files in DVI format

application/x-rar-compressed: RAR archive files

Vendor-specific (both Office 2007)

application/vnd.openxmlformats-officedocument.presentationml.presentation for pptx files

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet for xlsx files

The Internet Assigned Numbers Authority (IANA) maintains a list of registered media types. For more information about MIME media types, view the MIME Media Types Web page on the IANA Web site.

Content-Type headers can be either composite or discrete. The two composite top-level media types are as follows:

Message
Multipart

The five discrete top-level media types are as follows:

Text
Image
Audio
Video
Application

We will now look closer at the various media types and at examples of each.

Message

The message content-type enables messages to contain other messages or to contain pointers to other messages. The following are the different message content-types that can be included in a message.

message/delivery-status [RFC1894]

The message/delivery-status content type is defined for use in message delivery status notification. This type enables automated information transmission.

A delivery report is generated by Message Router(MAILBUS) and gatewayed by PMDF_MR to a DSN. In this example, the gateway did not have sufficient information to supply an original-recipient address.

Figure An Example of message/delivery-status

An Example of message/delivery-status

message/external-body [RFC2046]

The message/external-body content type enables the contents of a message to be external to the message and simply referenced in the message. The only required parameter of this content type is access-type, which can have values such as "FTP" and "LOCAL-ACCESS." Values begin with "x-" if they have not been registered by using IANA. Message/external-body parts must include a Content-ID header field that uses a unique identifier to reference the external data.

Figure An Example of message/external-body

An Example of message/external-body

When the message/external-body content type is used, consider the following history of the development of this content type.

The original MIME RFC, [RFC1521], enabled the body of an entity to be referenced externally instead of requiring the body of an entity to be inline. The current MIME RFC, [RFC2046], specifies the form of this construct. The security implications are as follows:

The blind retrieval of the content by the client can disclose information about the recipient.
The authentication mechanism tied to the retrieval (access-type parameter) can cause a pop-up dialog box, leading the user to expose credential information.
The server (Policy or delivery application) that is trying to check the content opens a denial of service vector for the remote host to tie up server resources.

For more information about the message/external-body content type, see the following MSDN Web site: [MS-OXCMAIL]: RFC2822 and MIME to E-Mail Object Conversion Protocol Specification.

message/partial [RFC2046]

The message/partial content type enables large messages to be broken up into smaller messages. The full message can be reassembled by the client or by the User Agent (UA). Only 7-bit content-transfer-encoding is allowed for this content type.

The following parameters are required:

ID: a unique identifier that is used to match up the pieces
Number: an integer identifying which piece of the message this is
Total: an integer indicating the total number of parts of the message (this parameter is required only on the final fragment of the message but should be used on all parts)

The following figures show the parts of a message/partial message, and the reassembled message.

Figure First part of message/partial

First part of message/partial

Figure Second part of message/partial

Second part of message/partial

Figure Client or User Agent(UA) reassembled message

Client or User Agent(UA) resembling the message

Exchange 2007 does not support Message/partial. When Message/partial messages are sent to Exchange 2007, a “5.6.1 Messages of type message/partial are not supported” NDR message is generated.

An explanation of this behavior is provided in section “2.3.2 Message/Partial” and “4.4 Do Not Support Message/Partial” in Exchange Protocol Document, [MS-OXCMAIL]: RFC2822 and MIME to E-Mail Object Conversion Protocol Specification.

2.3.2 Message/Partial

The message/partial content type is not supported. <198>MIME readers MUST reject messages that contain MIME entities with a message/partial Content-Type header field. This is to prevent virus scanning from being defeated by splitting up attachment content.

4.4 Do Not Support Message/Partial

“Message/partial” was originally designed to work around transmission failures during slow delivery that caused the complete message to be resent from scratch, and also to work around message size restrictions of implementations of protocols such as SMTP. With the advent of increased bandwidth speed and better connectivity, the long transmission times are mostly a thing of the past. Continued support for this content type allows an avenue for content that is inappropriate to reach (or leave) the e-mail client's computer. This could include things such as "Information disclosure" of proprietary information, unsolicited commercial e-mail (spam), and computer virus attachments.

E-mail servers try to protect their users from inappropriate content by implementing Policy applications that run as part of the protocol. For them to work efficiently, the complete content is incorporated into one message. For this reason, servers must not allow sending or receiving messages that have a “message/partial” content type.

message/rfc822

The message/rfc822 content type is used to enclose a complete message inside another message. It differs from other MIME body parts because it must be a fully formed RFC822 message, complete with headers.

The following is an example of an attached message/rfc822.

Figure Example of an attached message/rfc822

Example of message/rfc822 attached message

The following figure provides a detailed look at the message and at the attachment.

Figure Example of message/rfc822

Example of message/rfc822

Message/rfc822 content type is also used by envelope journal messages.

Message/rfc822 messages have the following limitations:

Exchange 2007 Journal Reports lose header information in the Microsoft Office Outlook client when you configure Exchange 2007 to deliver journal reports to an Exchange 2003 mailbox.

https://support.microsoft.com/kb/972524

Also, if an .eml attachment is included in a MAPI message, the message content-type changees from MESSAGE/RFC822 to APPLICATION/OCTET-STREAM and the whole message becomes Base64 encoded. Some clients such as Pine, Simeon, and Netscape cannot open the message but can save attachments. Outlook Express can successfully open this kind of message. This was done to determine whether the sent message has an .eml or .msg attachment.

Multipart

Multi-part Content-Type headers identify multipart messages. They require that a subtype and other elements be included in the header.

multipart/alternative

This content type is used to specify the same content in different body parts in different forms. They are positioned with increasing order of complexity.

The following example shows a multipart/alternative message.

Figure Example of Multipart/Alternative

Example of Multipart/Alternative

In this example, there are three body parts:

Text/plain
Text/enriched
Application/x-whatever

In a multipart/alternative message, the same content is included in each body part, but in specific formats. The message builds from the least complex to the most complex body part so that non-MIME clients can appreciate the benefits of the text/plain body part and MIME-aware clients can display the most complex body part they are able to support.

To send multipart/alternative messages in Exchange 2003, follow these steps:

Start Exchange System Manager.
Expand Global Settings, and then click Internet Message Formats.
Right-click Default, and then click Properties.
On the Message Format tab, select MIME in the Message encoding area, and then click Both.

To enable sending messages in multipart/alternative format in Exchange 2010 or in Exchange 2007, run the following cmdlet in Exchange Management Shell:

Set-RemoteDomain -Identity <RemoteDomainIdParameter> -ContentType MIMEHtmlText

multipart/mixed

This is the most frequently used content-type that is used to send an e-mail message that contains a body and attachments.

The multipart/mixed content type is used when the body parts are independent and have to be bundled in a particular order. When a client does not recognize a multipart subtype, it will treat the message as multipart/mixed.

Multipart/mixed specifies that the order of the body parts is important.

The following example shows a multipart/mixed content type message.

Figure Example of Multipart/Mixed

Example of Multipart/Mixed

Consider the following multipart/mixed content-type scenarios.

The body part of the message body should be presented first followed by the attachment. This is exactly as shown in the previous example. If the attachment is specified before the body part the message, the message body gets attached to the message instead of the intended attachment.

The body of a message is shown incorrectly as an attachment if you try to use an application in an Exchange Server environment to send a message that includes attachments. For more information, see 969854: The body of a message is shown incorrectly as an attachment if you try to use an application in an Exchange Server environment to send a message that includes attachments
When you send a message that has two or more attachments, and if the message has an inline attachment and an attachment body part, the inline body part appears before the attachment body part. The following example shows a multipart/mixed message that has an inline attachment.

Figure Example of multipart/mixed with inline attachment
When you send a message that has two or more attachments but that does not have an inline attachment, the order position of the attachment body part is not important.

The following example shows a multipart/mixed message that has two attachments.

Figure Example of multipart/mixed with two attachments

multipart/digest

The multipart/digest content type is used to send collections of plain-text messages. This is accomplished in a similar manner as for the multipart/mixed content type. However, every body part is expected to be the message/rfc822 content-type.

The following figures provide an example of a multipart/digest content type message.

Figure Example of Multipart/Digest

Example of a Multipart/Digest email message

Figure Details of the Multipart/Digest message

Details of the Multipart/Digest message

multipart/parallel

The purpose of the multipart/parallel content type is to display all the parts at the same time on hardware and software that can support them. For instance, an image file can be displayed while a sound file is playing.

The following example shows a multipart/parallel message content type message.

Figure Example of multipart/parallel message

Example of multipart/parallel message

When you compare the syntax of the multiple/mixed content type and the multiple/parallel content type, you will see that they are identical. For comparison, the multipart/parallel message example shows multiple/mixed and multiple/parallel in the same message.

multipart/related

The multipart/related content type is the most frequently used content-type after multipart/mixed. This is used mostly in conjunction with HTML data.

The multipart/related content type is used for compound documents. These are messages in which the separate body parts are intended to work together to provide the full meaning of the message.

Additionally, multipart/related can be used to provide links to content that is not contained in the message, or to provide a reference to an object in the message by using the content-id parameter. Multipart/related can be used for compound documents if the object is built progressively from pieces, starting with the "root" body part as specified in the start parameter. If the start parameter is not specified, the first body part is considered the starting point or "root" body part. Multipart/related requires a type parameter. The type parameter specifies the content type of the first or "root" part. Multipart/related processing takes precedence over content-disposition.

Many MIME user agents do not recognize the multipart/related content type. Instead, they treat these messages as multipart/mixed content type. To account for this, messages include the technically unnecessary Content-Disposition header in multipart/related body parts.

Content-location and Content-base headers are used to reference links that are external to the body of the message.

The following example shows a multiple/related message content type message that contains content-location and content-base headers.

Figure Example of multiple/related message when Content-base and Content-location header is used

Multiple/related Content-baseContent-location

The following figures show examples of a multipart/related message.

Figure Example of multipart/related message

Example of multipart/related message

Figure Details of the multipart/related message

Details of the multipart/related message

Note

The content-id header is specified in the Attachment body part and uses cid. It is referenced in the body part in which it must be used.

For more information about multiple/related content type messages, see Microsoft Knowledge Base article 954684, You cannot use an Outlook 2007 client to display or download an attachment when you access a message that includes an inline attachment from Exchange Server 2007

multipart/report

The multipart/report content type was defined for returning delivery status reports that include optional messages. This content type is finding wider use in computer-to-computer communication. The multipart/report is used for Message Disposition Notification.

An example of this content type is shown in the message/delivery-status figure.

Discrete media types

Text

The text content type is used for message content that is primarily in human-readable text character format. The more complex text content types are defined and identified so that an appropriate tool can be used to display that body part.

text/enriched

The text/enriched content type is intended to make multi-font, formatted e-mail widely readable. It uses a very limited set of formatting commands that all begin with <commandname> and end with </commandname>, These formatting commands affect the formatting of the text between these two tokens.

The following example shows a text/enriched content type message.

Figure Example of text/enriched

Example of text/enriched content type

text/html

The text/html content type is an Internet Media Type in addition to a MIME content type. Using HTML in MIME messages allows the full richness of Web pages to be available in e-mail.

The following example shows an HTML e-mail message and the detailed structure of the same message.

Figure An example of HTML e-mail

An example of HTML email

Figure Details of the HTML e-mail

Details of the HTML e-mail

The global structure of an HTML document consists of the following parts:

A line that contains HTML version information
A declarative header section
A body that contains the documents actual content (the body may be implemented by the body element of the frameset element)

The following example shows the HTML data.

Figure An example of HTML data

An example of HTML data

For more information, see The global structure of an HTML document

text/plain

The text/plain content type is the generic subtype for plain text. It is the default specified by RFC 822.

The following example shows a text/plain content type message.

Figure An example of text/plain message

An example of text/plain message

text/rfc822-headers

The text/RFC822-headers content type provides a mechanism for an MTA to label and return only the RFC 822 headers of a failed message instead of returning the complete message. The returned headers are useful for identifying the failed message and for diagnosing delivery problems. All headers are returned, up to the blank line following the headers.

The following example shows a text/rfc822-headers content type message.

Figure An example of text/rfc822-headers

An example of text/rfc822-headers

UUENCODE Attachment Format

The Unix-to-Unix encode (UUENCODE) format provided one of the earliest ways to add attachments to messages. In the UUENCODE format, attachments are appended to the message body after they are encoded by using the UUENCODE algorithm. Each attachment is prefixed with the file name and the encoding end string. Multiple attachments are individually appended in sequence and separated by a blank line. In the UUENCODE attachment format, the message body consists of only two basic parts: the message text and the message attachments.

The following example shows a message that includes a UUENCODE attachment and a detailed view of the structure of this content type.

Figure Message with UUENCODE attachment

Message with uuencode attachment

Figure Details of the UUENCODE message

Details of the uuencode message

The format of the UUENCODE message is as follows:

UUENCODE message starts with

begin <mode> <file>

Warning

In this message, <mode> represents the file's Unix read/write/execute permissions as three octal digits, and <file> represents the name that will be used to re-create the binary data.
The file ends with two trailer lines:

`

end
Each line except the last starts with “M” (which indicates 45 bytes of encoded data).
The grave accent character “`” is used in place of the space character.

MIME clients do not recognize the previous format, and they expect the message to be in the correct MIME format.

MIME clients expect the message to be in multipart/mixed. The message should have a plain/text body, and attachment body part should be uuencoded.

Limitations of UUENCODE format

Although the UUENCODE format lets you add attachments to messages, it does not define how to complete the following tasks:

Indicate the type of attachment, except through the file's extension
Specify alternate character encoding for the message text to support international languages
Relate groups of attachments
Indicate that the message text is a form of rich text, such as HTML or Rich Text Format (RTF) formatted text
Provide future enhancements to a structure of complex message bodies (the UUENCODE attachment format is neither flexible nor descriptive)

The following example shows a MIME-formatted UUENCODE message.

Figure MIME formatted UUENCODE message

MIME formatted UUENCODE message

For more information about the UUENCODE format, see the following MSDN and Wikipedia articles:

Analysis of Non-MIME Content

UUENCODE Attachment Format

Uuencoding

Transport Neutral Encoding Format (TNEF)

A TNEF message contains a plain text version of the message and an attachment that packages the original formatted version of the message. The attachment is named Winmail.dat.

The Winmail.dat attachment includes the following information:

The original formatted version of the message that includes, for example, fonts, text sizes, and text colors
OLE objects that include, for example, embedded pictures or embedded Microsoft Office documents
Special Outlook features that include, for example, custom forms, voting buttons or meeting requests
Regular message attachments that were in the original message

The resulting plain text message can be represented in the following formats:

An RFC 2822-compliant message composed of only US-ASCII text
A multipart MIME-encoded message that has a Winmail.dat attachment

Encoding options for Winmail.dat

Winmail.dat can be MIME-encoded or UUENCODE. A TNEF aware client can decode them successfully.

The following example shows a winmail.dat message that is encoded in MIME and UUENCODE.

Figure Winmail.dat encoded in MIME and UUENCODE

Winmail.dat encoded in MIME and UUENCODE

For more information about encoding options for Winmail.dat, see

241538: Description of Transport Neutral Encapsulation Format (TNEF) in Outlook 2000

The following example shows a MS-Tnef message.

Figure Example of a MS-Tnef message

Example of a MS-Tnef message

Summary Transport Neutral Encoding Format (STNEF) messages are encoded differently than TNEF messages. The following characteristics describe how these messages are encoded:

Always MIME encoded
Have Content-Transfer-Encoding: binary
Have no plain-text body and no distinct Winmail.dat attachment
Travel by using BDAT command instead of data
Can be transferred only between SMTP messaging servers that support and advertise the BINARYMIME and CHUNKING SMTP extensions as defined in RFC 3030

The following example shows an STNEF message.

Figure Example of STNEF message

Example of STNEF message

STNEF is understood by Microsoft Exchange 2000 Server and later versions. STNEF is automatically used by Exchange if the following conditions are true:

Exchange 2000 Server: STNEF is used for messages that are transferred between servers that are running Microsoft Exchange that are in the same routing group. An unsupported hotfix also enables Exchange 2000 Server to use STNEF for messages that are transferred between Exchange servers in different routing groups.
Exchange 2003: If the Exchange organization is in native mode, STNEF is used for all messages that are transferred between Exchange servers in the organization.
Exchange 2007: STNEF is used for all messages that are transferred between Exchange servers in the organization.

Exchange never sends STNEF messages to external recipients. Only TNEF messages can be sent to recipients outside the Exchange organization.

For more information about STNEF messages, see Understanding Content Conversion.

Content-Transfer-Encoding

The Content-Transfer-Encoding header field can describe the following information about a message:

The encoding algorithm that was used to transform any non-US-ASCII text or binary data that is located in the message body
An indicator that describes the current condition of the message body

There can be many values of the Content-Transfer-Encoding header field in a MIME message. When the Content-Transfer-Encoding header field appears in the message header, it applies to the whole body of the message. When the Content-Transfer-Encoding header field appears in one of the parts of a multipart message, it applies only to that part of the message.

The purpose of ncoding is to convert the data into US-ASCII. This is required so that data can pass through an SMTP host successfully. Many old SMTP messaging servers support only US-ASCII messages.

The following values of Content-Transfer-Encoding can be used in Internet SMTP messages:

7-bit encoding mechanism
Base64 and quoted-printable (encoding schemes that make sure that the content will correctly pass through all messaging servers)
8-bit and Binary content-transfer (defined to explicitly identify content that may require processing or encoding before it is packaged for Internet transfer)

Base64

The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a form that is not humanly readable. The encoding and decoding algorithms are simple, but the encoded data is about 33 percent larger than the unencoded data. The encoding is almost identical to the encoding that is used in Privacy Enhanced Mail (PEM) applications.

Note

The Base64 encoding is adapted from RFC 1421. However, Base64 eliminates the "*" mechanism for embedded clear text.

For more information, see RFC 1421.

Base64 processes data as 24-bit groups. The data is mapped to four encoded characters. It is sometimes referred to as 3-to-4 encoding. Each 6-bit group of the 24-bit group is used as an index to a mapping table, known as the Base64 alphabet, to obtain a character for the encoded data. The encoded data has line lengths limited to 76 characters.

The following figure shows the US-ASCII code chart.

Figure US-ASCII code chart

US-ASCII code chart

US-ASCII characters are represented in binary equivalent as b7b6b5b4b3b2b1.

For more information, see the following topic on the Wikipedia Web site: ASCII.

The following example shows the conversion to Base64 for the three US-ASCII characters, XCO.

Find the binary equivalent (the table is 7-bit but it has to be represented in eight bits)

X is 01011000

C is 01000011

O is 01001111
Represent the binary data from left to right

XCO will be represented as 010110000100001101001111
Divide the binary data into groups of six:

010110 000100 001101 001111
Convert the six-bit groups to a decimal value. In this example, the decimal values are as follows:

22 4 13 15
Look up the decimal values in the Base64 alphabet table to determine the Base64 equivalent:

W E N P

In this example, the Base64 encoding data for the three US-ASCII characters XCO is WENP.

The following figure shows the Base64 character table:

Figure Base64 character table

Base64 character table

For more information, see the following topic on the Wikipedia Web site: Base64.

Padding

Padding is a special processing that is performed if fewer than 24 bits exist at the end of the data that is being encoded in Base64. A full encoding quantum is always completed at the end of a body. When there are fewer than 24 input bits available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed by using the '=' character.

Because all Base64 input is an integral number of octets, only the following cases can occur:

The final quantum of encoding input is an integral multiple of 24 bits, and the final unit of encoded output will be an integral multiple of 4 characters without '=' padding
The final quantum of encoding input is exactly 8 bits, and the final unit of encoded output will be two characters followed by two '=' padding characters
The final quantum of encoding input is exactly 16 bits, and the final unit of encoded output will be three characters followed by one '=' padding character

Quoted-printable

Quoted-printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the ASCII character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. If the data being encoded is mostly ASCII text, the encoded form of the data remains largely recognizable by humans. A body that is entirely ASCII may also be encoded in Quoted-Printable. This helps the integrity of the data if the message passes through a character-translating or line-wrapping gateway. All printable US-ASCII text characters except the equal sign (=) can be represented without encoding.

For example, consider the following text:

Please consider the environment before printing this e-mail. Please consider the environment before printing this e-mail. Please consider the environment before printing this e-mail.

This text can be represented in the Quoted Printable encoding as follows:

Please consider the environment before printing this e-mail. Please conside=

r the environment before printing this e-mail. Please consider the environm=

ent before printing this e-mail

Due to soft line breaks, the Quoted-Printable encoding requires that encoded lines be no more than 76 characters long.

Warning

The 76 character limit does not count the trailing CRLF but does count all other characters, including the equal sign.

When an encoding algorithm is not used on the message body, the Content-Transfer-Encoding header field identifies the current condition of the message body data.

7-bit

This value indicates that the message body data is already in the RFC 2822 format. Specifically, this means that the following conditions must be true:

All lines of text must be less than 998 characters long.
All characters must be US-ASCII text that has character values 1 through 127, inclusive.
The CR and LF characters can be used together only to indicate the end of a line of text.

The whole message body may be 7-bit, or part of the message body in a multipart message may be 7-bit. If the multipart message contains other parts that have any binary data or non US-ASCII text, that part of the message must be encoded using the Quoted-printable or Base64 encoding algorithms.

Note

Messages that have 7-bit bodies can travel between SMTP messaging servers by using the standard DATA command.

Figure Message submitted using Data command

Message submitted using Data command

8-bit

This value indicates that the following conditions must be true:

All lines of text must be less than 998 characters long.
All characters must be US-ASCII texts that have character values 1 through 127, inclusive.
The CR and LF characters can be used together only to indicate the end of a line of text.

The whole message body may be 8-bit, or part of the message body in a multipart message may be 8-bit. If the multipart message contains other parts that have any binary data or non US-ASCII text, that part of the message must be encoded using the Quoted-printable or Base64 encoding algorithms.

Note

Messages that have 8-bit bodies can only travel between SMTP messaging servers that support the 8BITMIME SMTP extension as defined in RFC 1652, such as Exchange 2000 Server or later versions.

Specifically, this means that the following conditions must be true:

The 8BITMIME keyword must be advertised in the server's EHLO response.
Messages are still transferred by using the SMTP standard DATA command. However, the BODY=8BITMIME parameter must be added to the end of the MAIL FROM command.

The following example shows an 8-bit MIME message.

Figure A message delivery of 8-bit MIME message

A message delivery of 8-bit MIME message

Binary

This value indicates that the message body contains non-US-ASCII text or binary data. Specifically, this means that the following conditions are true:

Any sequence of characters is permitted.
There is no line length limitation.
Binary message elements do not require encoding.

Messages that have binary bodies can travel only between SMTP messaging servers that support the BINARYMIME SMTP extension as defined in RFC 3030. For example, Exchange 2000 Server or a later version.

Specifically, this means that the following conditions must be true:

The BINARYMIME keyword must be advertised in the server's EHLO response.
The BINARYMIME SMTP extension can only be used with the CHUNKING SMTP extension. Chunking enables large message bodies to be sent in multiple, smaller chunks. Chunking is also defined in RFC 3030. The CHUNKING keyword must also be advertised in the server's EHLO response.
Messages are transferred by using the BDAT command instead of the standard DATA command.
The BODY=BINARYMIME parameter must be added to the end of the MAIL FROM command if the message has a message body.

Note

Binary encoded messages are not valid Internet messages.

The following example shows a message that is sent by using the BDAT with chunking.

Figure Example of BDAT with chunking

Example of BDAT with chunking

The following example shows a message that is sent by pipelining Binary MIME.

Figure Example of pipelining Binary MIME

Example of pipelining Binary MIME

The following list describes the key points of Content-Transfer-Encoding:

The 7-bit, 8-bit, and Binary values never exist together in the same multipart message. The values are mutually exclusive.
The Quoted-Printable or Base64 values may appear in a 7-bit or 8-bit multipart message body, but never in a binary message body.
If a multipart message body contains different parts that are composed of 7-bit and 8-bit content, the whole message is classified as 8-bit.
If a multipart message body contains different parts composed of 7-bit content, 8-bit content, and binary content, the whole message is classified as binary.

Content-Disposition

The Content-Disposition header field instructs a MIME-enabled e-mail client how it should display an attached file. The values of this field may be Inline or Attachment.

When the value of this field is Inline, the attachment is displayed in the message body.

When the value of this field is Attachment, the attached file appears as a regular attachment that is separate from the message body. Other parameters are available when the value is Attachment, including Filename, Creation-date, and Size.

This header is ignored when it appears in multipart/related body parts.

This header cannot contain any comments.

The following example shows a message that has an attachment.

Figure An example of a message that has an attachment

An example of message with attachment

The following example shows a Content-Disposition header field.

Figure Content-Disposition header highlighted in the message source

Content-Disposition header highlighted in the mess

Character Encoding (charset)

Character encoding (charset) is a collection of letters and symbol used to specify information to the receiving client. SMTP data is sent only in US-ASCII format so that it can safely pass through an SMTP host. However, the message can actually contain information that belongs to another language. Or, the message can contain encoded information. Adding charset information to the data will help the receiver decode the message correctly.

Charset can be specified in the content-type header of the body parts.

For example, the message in the “Content-Disposition header” figure in the preceding section shows the following charset encoding:

Content-Type: text/plain; charset=”iso-8859-1”

Charset can also be specified in the message body. It is mostly used in an HTML message body. In the HTML body, it is specified in the metadata of the HTML portion.

In the following example, charset=us-ascii is specified in the body:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE>Message</TITLE>

<META http-equiv=3DContent-Type content=3D"text/html; =

charset=3Dus-ascii">

<META content=3D"MSHTML 6.00.2800.1498" name=3DGENERATOR></HEAD>

<BODY>

The rain in Spain falls mainly on the=20

plain. </BODY></HTML>

Note

The charset specified in the Content-type header and the message body for the same body part should match.

The following example shows charset matching in an e-mail message.

Figure Charset matching in an e-mail message

Charset matching in an e-mail message

Additional Resources

Handling character encodings in HTML and CSS

Introducing Character Sets and Encodings

[MS-OXGLOS]: Exchange Server Protocols Master Glossary

White Paper: Message Formatting

Summary

Applies to

Introduction

Structure of a message

Message Content

Message Header

Header Folding

Message Body

Originator Fields

MIME

MIME Headers

MIME-Version

Content-Type

Message

Multipart

Discrete media types

Text

Content-Transfer-Encoding

Character Encoding (charset)

Additional Resources

Additional resources