Exchange Server 2003 Message Routing

 

For messages to nonlocal recipients, the routing engine must provide the advanced queuing engine with information about the next hop host on the transfer path of the destination and the next hop type, as discussed in the previous topic. The next hop host is the actual routing address and the next hop type determines how the advanced queuing engine handles the message. To provide this important information, the routing engine must have a complete view of the whole routing topology. This includes all routing groups and their servers, routing group connectors, and connectors to external messaging systems. In Exchange Server 2003, this information is in Active Directory directory service.

Every Exchange 2003 server maintains its own routing table, called the link state table, dynamically in memory, based on Active Directory and link state information, as follows:

  • Routing-related Active Directory information   This information is stored in attributes of the organization object, routing group objects, connector objects, and server objects. These objects reside in the configuration directory partition and define the routing topology of the entire Exchange organization.

    Note

    Administrative groups are not part of the routing topology in an Exchange organization.

  • **Link state information   **This information specifies whether each connector in the routing topology is available (up) or unavailable (down). Link state information is dynamic and might change when a connector experiences transfer problems or when transfer issues are resolved. For more information about link state changes and the propagation of link state information across an Exchange organization, see Link State Propagation.

Upon startup, each Exchange server initializes its link state table with the following information from Active Directory:

  • Organization object   The routing topology boundary is the Exchange organization. That is, the link state table does not include any information about external bridgehead servers or messaging connectors in an external messaging system. As far as the routing engine is concerned, the routing topology ends at the connector to the external messaging system. Accordingly, the routing engine reads the GUID that is registered in the objectGUID attribute of the Exchange organization object in Active Directory and stamps the link state table with this GUID to determine the organization to which this routing information belongs.

  • Routing group objects   The routing engine enumerates all routing groups that exist in any administrative groups and queries each routing group for all object attributes, including the msExchRoutingGroupMembersBL attribute that contains a list of all routing group member servers. The routing engine puts this information in the link state table. The routing engine also puts the servers together with the GUID of the server's routing group in a server cache in memory. Each entry in the server cache is a server FQDN appended by the server's routing group GUID.

    Another important routing group attribute is the msExchRoutingMasterDN attribute, which points to the distinguished name of the routing group master in the selected routing group. For more information about the tasks and responsibilities of the routing group master, see the discussion later in this section.

  • Messaging connector objects   The routing engine enumerates all child objects with an object type of msExchconnector that exist in the Connections container of each routing group. The msExchconnector objects in the Connections container are the routing group connectors and connectors to external messaging systems configured in the routing group. The routing engine reads all attributes from these connector objects to determine address spaces, cost values, restrictions, and more. The routing engine puts the information for each connector in the link state table. This enables messages to destinations outside the local routing group to be routed.

The process continues until the routing engine identifies all directly and indirectly connected routing groups and queries for the configuration details of their messaging connectors. When this process ends, the routing engine has a complete view of all available transfer paths across the Exchange organization. All links are assumed to be up and available for message transfer. Following the initialization of the link state table, the routing engine communicates with the Microsoft Exchange Routing Engine service on the local server to obtain dynamic link state information that reflects the current state of each connector. The Exchange Routing Engine service connects to the routing group master in the local routing group through TCP port 691 to retrieve this information. For more information about link state information, see the section, "Examining the Link State Table," later in this topic.

Routing Engine and Exchange Routing Engine Service

The routing engine in the transport subsystem and the Exchange Routing Engine service have different roles. The Exchange Routing Engine service does not perform any message routing. The Exchange Routing Engine service communicates link state information between servers that are running Exchange 2000 Server and Exchange Server 2003 in the local routing group. The Exchange Routing Engine service is implemented in resvc.dll, which resides in the \Program Files\Exchsrvr\bin\ directory. The service name is RESvc. For more information about the Microsoft Windows services of Exchange Server 2003, see Exchange Server 2003 Services Dependencies.

The Exchange Routing Engine service is an intra-routing group link state communication service, instead of a routing engine. The actual routing engine that the SMTP advanced queuing engine and Exchange MTA use to route messages is implemented in a file that is named reapi.dll. For the Exchange MTA, some additional code is in mtaroute.dll. Therefore, when the Exchange Routing Engine service is stopped, both the advanced queuing engine and Exchange MTA still use the code in reapi.dll to route messages. Only dynamic updates to the link state table are not received any longer.

Note

Although not generally recommended, you can disable the Exchange Routing Engine service on all servers that are running Exchange Server 2003 in an organization. The code in reapi.dll can still initialize the link state table on each server with information from Active Directory, but there are no dynamic updates to the link state table. In this case, Exchange Server 2003 performs static message routing.

The link state table is a small, in-memory database that is not stored on disk. To examine the entries that the routing engine uses to make routing decisions, you can use the Exchange Server 2003 WinRoute tool (Winroute.exe), which is available for download from the Downloads for Exchange Server 2003 Web site.

Note

The WinRoute tool also shipped with Exchange 2000 Server, but it is best to download and use the Exchange Server 2003 version of this tool on all Exchange 2000 and Exchange Server 2003 servers in your organization.

The WinRoute tool connects to the link state port, TCP port 691, on the selected Exchange server and extracts the link state table. The information in this table is a series of GUIDs and ASCII text that represent routing groups, routing group members, and connectors in the routing groups. The link state table also includes information about the configuration of each connector. The information fields in the link state table are separated by parentheses as follows:

'General Info' ('Routing Group' 'Routing Group Master' 'Version Info' 'Routing Group Addresses' (Routing Group Members) (Connectors in Routing Group (Connector configuration))).

The following is a shortened example of a link state table (all except one routing group removed):

d38082e7c9ecd74dbff32bada8932642 d037d6eaf2fa7cd10934aca433390623 (489416bfa3a4ff459b8f4403f20cad0d 1650c1fe32aef740be236e1089e0da6a 8 0 2 c2da71f9b39ec748aaf44119a2bdcb36 {26}*.489416BF-A3A4-FF45-9B8F-4403F20CAD0D {4c}c=DE;a= ;p=TailspinToys;o=Exchange;cn=489416BF-A3A4-FF45-9B8F-4403F20CAD0D;* {55}/o=TailspinToys/ou=First administrative Group/*/489416BF-A3A4-FF45-9B8F-4403F20CAD0D ( 1650c1fe32aef740be236e1089e0da6a YES 1 1b20 {10}0701000000000101 ) ( aa582d35e9621c4ca8ae57aa33d953a1 ( CONFIG {4}SMTP {} {23}_aa582d35e9621c4ca8ae57aa33d953a1_D {63}/o=TailspinToys/ou=First administrative Group/cn=Configuration/cn=Connections/cn=RGC RG A <-> RG B 0 0 0 0 ffffffff ffffffff 0 1 0 () 0 () 0 () 0 () ARROWS ( {2}RG {20}83bd0e29fad06d4eb8b00faab3265cd5 1 {4}X400 {23}c=DE;a= ;p=TailspinToys;o=Exchange; 1 ) BH () TARGBH ( 766a192b43bfc3459ee85608d65a98a9 CONN_AVAIL {19}server01.TailspinToys.com ) STATE UP)))

(... next routing group... (... next routing members...) (... connectors in routing group (... connector configuration..)))

The following table maps this information to the various information fields in the link state table.

Field Value Comments

Organization objectGUID

d38082e7c9ecd74dbff32bada8932642

The GUID that is registered in the objectGUID attribute of the Exchange organization object in Active Directory.

MD5 Digest

d037d6eaf2fa7cd10934aca433390623

An MD5 digest or hash value. This is an encrypted signature that represents the version number for the link state table. Based on this information, routing engines can determine whether they have the same link state information. If the information differs, routing engines exchange OrgInfo packets to determine which server has the most up-to-date information. The OrgInfo packet contains the link state table, with all details and states of the routing topology. The propagation of link state information is discussed later in this section.

Routing Group objectGUID

489416bfa3a4ff459b8f4403f20cad0d

The GUID that is registered in the objectGUID attribute of the routing group object to which the routing information belongs. This GUID follows next in the link state table.

Routing Group Master objectGUID

1650c1fe32aef740be236e1089e0da6a

The GUID that is registered in the objectGUID attribute of the server that acts as the routing group master in this routing group.

The routing group master within each routing group is responsible for maintaining and communicating link state information to all routing group members. Only one routing group master exists per routing group. For more information about the role of the routing group master, see the discussion later in this section.

Version Info

8 0 2 c2da71f9b39ec748aaf44119a2bdcb36

The values 8 0 2 are the major, minor, and user versions of the link state information. The routing engine uses this version information to classify updates to the link state information, as follows:

  • Major updates   Represent routing topology changes, such as connector configuration changes (that is, adding or deleting a connector, adding or deleting an address space on a connector, or designating a new server as the routing group master).

  • Minor updates   Represent changes to the availability of a virtual server or connector. For example, the state of a connector might change from up to down if the connector's source bridgehead server is unavailable.

  • User updates   Represent changes that occur when services are started or stopped on an Exchange server or when a server loses its connectivity to the routing group master. Adding a new server to a routing group also represents a user update.

The remaining data is the GUID of this version information.

Routing Group Addresses

{26}*.489416BF-A3A4-FF45-9B8F-4403F20CAD0D {4c}c=DE;a= ;p=TailspinToys;o=Exchange;cn=489416BF-A3A4-FF45-9B8F-4403F20CAD0D;* {55}/o=TailspinToys/ou=First administrative Group/*/489416BF-A3A4-FF45-9B8F-4403F20CAD0D

Maps SMTP, X.400, X.500, and address information to individual routing group GUIDs. The routing engine uses this information to generate an internal server cache, which is used to determine the routing group of each server in the routing topology. The server cache is an internal table of the routing engine.

For example, assume that SERVER01 in a routing group named First Routing Group has an FQDN of SERVER01.TailspinToys.com. According to the routing group address definition, the routing engine creates an entry for SERVER01 in the server cache, as follows:

SERVER01.TailspinToys.com.489416BF-A3A4-FF45-9B8F-4403F20CAD0D.

During a routing event, when the advanced queuing engine passes the FQDN to the routing engine, the routing engine looks up the server cache, finds the entry for SERVER01.TailspinToys.com, and quickly determines the target routing group. The principle is the same for X.400 and X.500 addresses; only the address information is more complex.

Routing Group Members

( 1650c1fe32aef740be236e1089e0da6a YES 1 1b20 {10}0701000000000101 )

Contains a list of all servers that belong to the routing group and identifies their state. However, note that the routing engine does not use this information for message routing. As discussed earlier in this section, the routing engine uses the server cache.

The routing group members are listed in the Routing Group Members () list for the purposes of system monitoring. You can view this information in Exchange System Manager, when you open the Tools node, then Monitoring and Status, and then Status.

The server status entries in the Routing Group Members () list contain the following information:

  • The objectGUID of the server: 1650c1fe32aef740be236e1089e0da6a

  • Whether the member is connected to the routing group master. YES indicates that the server is connected.

  • Server version number: 1

  • Build version: 1b20 hex = 6944

  • User data: {10}0701000000000101

The user data indicates the state of the server. If the value begins with 0701, the server is available and operating. If the value begins with 0702, the server is in a warning state. If the value begins with 0703, the server is in a critical state.

You can switch a server to maintenance mode to deselect server monitoring temporarily, in which case the value begins with 0781.

Connectors in Routing Group

( aa582d35e9621c4ca8ae57aa33d953a1 ( CONFIG ))

Starting at the next open parenthesis, each connector that belongs to the routing group is listed in a separate entry that includes the connector's objectGUID and the configuration information that the routing engine uses to make message routing decisions.

Note

The connector configuration information in the link state table has the fields that are described in the following table entries.

Connector objectGUID

aa582d35e9621c4ca8ae57aa33d953a1

The GUID that uniquely identifies the connector in the Exchange organization.

Connector Type

{4}SMTP

Following the CONFIG keyword, this field identifies the connector type. The type can be SMTP, X.400, Notes, or Exchange Development Kit (EDK). The Notes and EDK types refer to instances of a MAPI-based messaging connector connecting to a non-Exchange messaging system. For more information about MAPI-based connectors, see Gateway Messaging Connectors Architecture.

Tip

The number in curly brackets is not an identifier. This number indicates the string length of the field value in hexadecimal format.

Note

There is no explicit type for routing group connectors. Routing group connectors use SMTP to transfer messages.

Source Bridgehead Address

{}

This field can have one of three values:

  • No value   If no source bridgehead server is specified, then any server in the local routing group can use this connector to transfer messages. This applies to routing group connectors if the option Any local server can send mail over this connector is used.

  • A connector GUID   For SMTP connectors and routing group connectors, you can specify specific local bridgehead servers, in which case the Source Bridgehead Address field lists the connector GUID appended by an "_S" (without the quotation marks), to indicate a source bridgehead, such as:

    {23}_76290a25817c0643a1a6999e669b1d5f_S

    The local bridgehead servers are then listed later in the BH field in the connector information.

  • A bridgehead address   X.400 connectors and MAPI-based connectors cannot have more than one local bridgehead server. For these connectors, the local bridgehead server is specified in the Source Bridgehead Address field, such as: {8}SERVER01. To provide availability information, the local bridgehead server might also be listed later in the BH field in the connector information.

Destination Bridgehead Address

{23}_aa582d35e9621c4ca8ae57aa33d953a1_D

As with the Source Bridgehead Address field, this field can have one of three values:

  • No value   X.400 connectors and MAPI-based connectors do not have a destination bridgehead server in the link state table. These connectors use connector-specific information to determine their target system, such as the remote host name in the stacks configuration of an X.400 connector.

  • A connector GUID   For routing group connectors, the Destination Bridgehead Address field lists the connector GUID appended by a "_D" (without the quotation marks) to indicate a destination bridgehead. In this case, the target bridgehead servers are listed later in the TARGBH field in the connector information.

  • A bridgehead address   SMTP connectors cannot have multiple destination hosts when they connect routing groups to each other. The connector configuration requires you to specify a smart host in the remote routing group, which is then indicated as the destination bridgehead, such as: {8}SERVER02.

Legacy Distinguished Name

{63}/o=TailspinToys/ou=First administrative Group/cn=Configuration/cn=Connections/cn=RGC RG A <-> RG B

This is the distinguished name of the connector in legacy Exchange 5.5 directory format. The value corresponds to the legacyExchangeDN attribute of the connector object in Active Directory.

Schedule ID

0

The Schedule ID field is not used and is always set to 0. The advanced queuing engine and Exchange MTA query Active Directory to determine the activation schedule of a connector.

Restrictions

0 0 0 ffffffff ffffffff 0 1 0 () 0 () 0 () 0 ()

The Restrictions field identifies the scope of the connector, message size restrictions, and other constraints, as follows:

  • The scope of the connector is identified by the first digit. A value of 0 indicates that the scope is "Organization." A value of 1 indicates that the scope is "Routing Group."

Note

Routing group connectors always have a scope of "Organization." Connectors to external messaging systems can be restricted to the local routing group.

  • The next digit indicates whether triggered delivery is configured. A value of 0 means no triggered delivery. A value of 1 means that the remote host must trigger the message transfer (for example, TURN/ETRN).

  • The third digit identifies the message type (high, normal, low, system, and non-system) that is allowed through this connector.

  • The next eight bytes specify message size restrictions, if any. If no message size restrictions apply to this connector, the value is ffffffff.

  • The second eight-byte block indicates whether a large message threshold is set. The value ffffffff indicates that no message threshold is set. Any other value specifies the threshold in kilobytes.

  • The following digit specifies whether public folder referrals are allowed (0 = allowed, 1 = not allowed).

  • The next digit indicates whether messages are accepted from everyone by default. A value of 1 means that all messages are accepted by default. A value of 0 means that all messages are denied by default.

  • The next four fields (0 () 0 () 0 () 0 ()) are lists of originators and distribution lists that are allowed or denied to send messages through this connector. The first list contains the distinguished names of allowed originators, the second list contains the distinguished names of denied originators, and the final two lists contain the allowed distribution groups or denied distribution groups. The numbers in front of the brackets stand for the number of entries in each list.

    The following is an example of a list with two originators (the format is the same for all accept and deny lists): 2 ( {2d}CN=Ted Bremer,CN=Users,DC=TailspinToys,DC=com {30}CN=Administrator,CN=Users,DC=TailspinToys,DC=com ).

Address Spaces

ARROWS ( {2}RG {20}83bd0e29fad06d4eb8b00faab3265cd5 1 {4}X400 {23}c=DE;a= ;p=TailspinToys;o=Exchange; 1 )

Each connector has at least one associated address space. The routing engine uses this information to determine possible connectors for a particular message by comparing the recipient addresses with available address space information.

In the link state table, the ARROWS () list contains the individual address spaces that belong to the connector. Each address space entry contains the following three pieces of information:

  • Address space type   The address space type determines the format of the address space information that follows in the next position. For example, an X.400 address space requires address space information in a valid X.400 format. An SMTP address space, on the other hand, contains parts of an SMTP domain name. For routing group connectors, the address space type is RG, which stands for a routing group objectGUID.

  • Address space   The address space specifies the address pattern that the routing engine compares to the recipient addresses to identify the destination of the message. The routing engine uses address spaces differently between external and internal recipients.

    For external recipients, the destination is a messaging connector to the external messaging system. The advanced queuing engine passes the external address information to the routing engine and the routing engine selects the connector that most closely matches the destination. For example, if an SMTP connector has an address space of SMTP: *.net and another SMTP connector has an address space of SMTP: *, the routing engine selects the first SMTP connector for all recipients that are in .net domains and the second SMTP connector for all remaining Internet recipients.

    For recipients in the local organization, address spaces are defined in recipient policies (address space setting This Exchange Organization is responsible for all mail delivery to this address). If a recipient's address matches an address space for the local organization, the categorizer determines the recipient's home server based on the recipient's msExchHomeServerName attribute. The categorizer stamps the recipient with the home server's FQDN, and the advanced queuing engine passes that FQDN to the routing engine to route the message to its destination in the local organization. The routing engine uses the FQDN to locate its server cache. It finds an entry for the home server, and this entry includes the recipient's home routing group GUID.

    The routing engine uses the recipient's home routing group GUID to determine how the message must be transferred, as follows:

    1. If the home routing group GUID is equal to the local routing group GUID, then the recipient is in the local routing group, and the message must be transferred directly to the recipient's home server using the Exchange server's default virtual SMTP server. The routing engine returns the FQDN of the recipient's home server to the advanced queuing engine to indicate the next hop host.

      Note

      Servers running Exchange Server 5.5 are exceptions that communicate with Exchange 2003 in the local routing group through RPCs and the MTA service, as explained earlier in this section.

    2. If the home server's routing group is not the local routing group, then the message must be transferred to the destination using a routing group connector. Connectors that can transfer messages to a routing group must have a routing group address space that includes the destination group's GUID. Therefore, the routing engine can create a topology view that includes all possible transfer paths, beginning at the source and ending at all possible destinations in the Exchange organization. Based on the recipient's routing group GUID, the routing engine can find the ultimate destination of the message in the Exchange organization and can then return the next hop on the shortest path to that destination to the advanced queuing engine. This is explained in more detail later in this section.

  • Cost   Cost values are associated with address spaces and determine which connector is preferred for message transfer. The value can range from 1 to 100. If multiple connectors exist for the same destination, the connector with the lowest cost value is preferred. If multiple connectors have the same cost value, the routing engine selects a random connector to provide a simple form of load balancing.

Source Bridgeheads

BH ()

The BH field lists the local bridgehead servers for the connector and their status information. Bridgehead servers are identified using the following three pieces of information:

  • Bridgehead Server objectGUID   The GUID of a virtual SMTP server, which is specified in the connector configuration as a local bridgehead server.

  • Bridgehead Server Status   Information that indicates the availability of the bridgehead server, as follows:

    • CONN_AVAIL   The bridgehead server is available.

    • VS_NOT_STARTED   A virtual SMTP server is stopped or is not started.

    • CONN_NOT_AVAIL   The connection is unavailable on the bridgehead server. For example, the source bridgehead server cannot establish a connection to a destination bridgehead server.

  • Virtual Server FQDN   The FQDN of the virtual server that acts as a bridgehead server for this connector.

Destination Bridgeheads

TARGBH ( 766a192b43bfc3459ee85608d65a98a9 CONN_AVAIL {19}server01.TailspinToys.com )

As with the BH () list, the TARGBH () list contains the destination bridgehead servers for a connector. This list is particularly important for routing group connectors, which can have more than one remote bridgehead server.

In the example, the following information identifies the remote bridgehead server:

  • Bridgehead Server objectGUID   766a192b43bfc3459ee85608d65a98a9

  • Bridgehead Server Status   CONN_AVAIL

  • Virtual Server FQDN   {19}server01.TailspinToys.com

Status

STATE UP

The status of the connector. This field can have two possible values:

  • STATE UP   Indicates that the connector is available.

  • STATE DOWN   Indicates that the connector is unavailable.

The connector state is derived from the state of the connector's source bridgehead servers. A connector is STATE UP only if at least one source bridgehead server is available (CONN_AVAIL). If none of the connector's source bridgehead virtual servers is started (VS_NOT_STARTED) or the source bridgeheads cannot establish a connection (CONN_NOT_AVAIL), the connector state is STATE DOWN.

Note

For a connector to be marked as down, all local bridgehead servers for this connector must be unavailable. Routing group connectors configured to use the option Any local server can send mail over this connector, in addition to DNS-routed SMTP connectors and MAPI-based connectors, are never marked as down. Routing group connectors, in which one bridgehead server is an Exchange 5.5 server, are never marked as down.

Note

The WinRoute tool provides an intuitive view of the routing topology and link state table by resolving the GUIDs in the link state table to names in a format that you can read, if the tool has access to Active Directory. The upper pane of the WinRoute program window displays the interpreted data, the middle pane lists all existing address spaces, and the lower pane displays the raw information from the link state table. For more information about the WinRoute tool, see tools downloads at Tools for Exchange Server 2003.

Exchange Routing Path Selection

In an organization with multiple routing groups, various routes might lead to the same destination. Typically, the most efficient (that is, the shortest or cheapest) route is used for message transfer, and additional routes stand by, in case the best route is temporarily unavailable. For example, in the topology shown in the following figure, multiple transfer routes exist between all routing groups.

A routing topology with five routing groups

b83a8987-67cd-4228-a68b-faf721d763d8

Note

Message routing should follow the physical network topology. If the underlying network topology is designed in a true hub-and-spoke arrangement (with routing groupA as the hub), it makes little sense to define routing group connectors as shown in the figure above. Instead, routing groups B, C, D, and E should be connected directly to routing groupA, and all inter-routing group message transfer should be routed through routing groupA. In a genuine hub-and-spoke arrangement, there are no alternate message paths, and the routing path selection is straightforward. For the explanations in this section, however, it is assumed that the physical network topology is a mesh that follows the arrangement of routing group connectors shown.

The routing engine uses the following information to determine the best route:

  • Address space   When configuring routing group connectors, you associate possible destinations with messaging connectors by using the Connected Routing Groups tab in the connector properties. However, the routing group connector does not provide this tab. Because this connector is used only to connect to routing groups, the routing engine can determine the routing group address spaces from the connector configuration.

    Routing group GUIDs and RG address spaces

    db861687-48f0-40cf-9e55-4fb6e7224d49

    Address spaces can be added to a connector through the Address Space tab. As mentioned in the "Information in the link state table" table, address spaces consist of an address type, such as RG, SMTP, X400, MSMAIL, or CCMAIL, an address , and a cost value. The cost value that you assign to an address space is an important routing criterion. The routing engine uses the Dijkstra shortest-path algorithm to make routing decisions. This algorithm is based on cost values.

  • Connector scope   Connectors to external messaging systems might be restricted to the connector's routing group, in which case only users in the local routing group of the connector are permitted to use this connector. By default, all connectors have a scope of Entire Organization.

    Note

    Routing group connectors are always available across the whole organization.

  • Restrictions   The routing engine determines the message size, priority, and message type (that is, system or non-system message). The routing engine compares these properties with available connector restriction information. It then excludes those connectors that cannot transfer the message due to effective connector restrictions from the list of potential connectors.

  • Status   Only available connectors are included in the route selection process. The status field of each connector indicates whether the connector is available (STATE UP) or unavailable (STATE DOWN).

Routing Path Selection Process

To select the best route to the destination, the routing engine calculates the shortest transfer route from the source routing group to the destination routing group across the Exchange organization, using the Dijkstra shortest-route algorithm. The routing engine then determines the next hop on the shortest route that the advanced queuing engine should use for message transfer.

The routing path selection is a two-step process:

  1. The advanced queuing engine calls the GetMessageType method on the IMessageRouter interface of the routing engine. In the GetMessageType method, the advanced queuing engine passes the message to the routing engine in the form of a MailMsg object.

    In this step, the routing engine performs the following processes:

    1. It checks message-trace information to detect loops. If a message loop is detected, the message is dropped with an NDR to the sender.

    2. It reads or recalculates (if necessary) the current organization topology (that is, it determines the list of shortest routes to all destinations in the routing topology, starting from the local routing group).

    3. It checks and possibly refreshes restriction information about connectors in the link state table.

    4. It determines all connectors to the message destination in the organization topology, and then analyzes message characteristics and connector restrictions to exclude all those connectors that must not be used to transfer the message.

    5. It computes a filter value for the message, which uniquely defines the message type. The message type identifies the actual path that messages with similar characteristics can use. The message type is cached. Therefore, the routing engine does not recalculate the filter value for subsequent messages with similar characteristics.

      Note

      The advanced queuing engine maintains a separate message queue for each message type.

    6. It creates associated message types. An associated message type is similar to the actual message type, but is calculated with relaxed restrictions. Associated message types enable the SMTP transport subsystem to return extended error codes if a transfer path is not available for the actual message type because of connector restrictions.

    7. It returns the index of the cached message type to the advanced queuing engine.

  2. The routing engine determines the next hop on the shortest route. To complete this step, the advanced queuing engine calls the GetMessageType method on the IMessageRouter interface. The most important information that the advanced queuing engine passes to the routing engine at this point is the destination address and the message type ID. For recipients in the Exchange organization, the destination address is the FQDN of the recipients' home server. The routing engine determines the destination routing group from the server cache, checks the available route for the message type, and returns the next hop on the route to the destination routing group to the advanced queuing engine. The advanced queuing engine can then transfer the message to the next hop on the way to the destination.

Dijkstra's Shortest-Path Algorithm

To make correct routing decisions, the routing engine must know the shortest routes to all possible destinations in the routing topology. The routing engine must find the shortest routes from all available transfer routes to all destinations in a complex routing topology. This problem is known as the single-source shortest paths problem.

The following figure shows that even in a relatively straightforward routing topology, many routes can exist from one routing group to any other routing group. The figure shows the routing group connectors from Figure 5.4 in simplified form, with their default cost values of 1.

Routing group connectors with default cost values

0bdc24c5-23dc-4590-b888-f551d15521ad

In 1959, Professor Edsger Dijkstra solved the single-source shortest paths problem by developing an algorithm that locates, in a single calculation, the shortest paths from a given source to all points in a topology.

The routing engine uses the Dijkstra algorithm, as follows:

  1. It is assumed that the routing topology representing all the paths from one routing group to all other routing groups is a spanning tree. This determines that the topology must include all routing groups and routing group connectors, and that there are no loops between routing groups. Therefore, paths in the routing topology that allow a message to return to the source routing group are illegal transfer paths and are not included in the calculation.

  2. Based on Dijkstra's algorithm, the routing engine maintains two sets of routing groups. The first set includes all groups for which the shortest path from the source routing group has already been determined. The second set includes all remaining routing groups. At first, the set of routing groups for which the shortest paths from the source routing group have already been determined is empty. As long as there are routing groups remaining that have not been processed, the routing engine performs Steps 3 through 6, as follows.

  3. The routing engine sorts the remaining routing groups according to the current best estimate of their distance (that is, the sum of cost values) from the source routing group.

  4. It then adds the closest routing group to the set of routing groups for which the shortest paths have been determined.

  5. The routing engine then updates the costs of all the routing groups connected to that routing group (if this improves the best estimate of the shortest path for each of the remaining routing groups) by including the cost value of the connector between those routing groups in the distance value.

  6. It updates the predecessor for all updated routing groups. The list of predecessors eventually defines the shortest path from each routing group to the destination routing group.

The following steps illustrate how the routing engine finds the shortest paths from routing group A to all other routing groups in the routing topology:

  1. The calculation begins at routing group A because in this example the source is routing group A. The distance value of routing group A to itself is zero. The distance value of all other routing groups has not been determined.

  2. Routing group A is added to the set of routing groups for which the shortest paths from the source routing group have been determined. Then, the distance value of all routing groups adjacent to routing group A is updated with the cost values of their connectors. The predecessor (indicated by the source of the black arrows) for all these routing groups is then updated. The predecessor is routing group A.

  3. The routing engine sorts the remaining routing groups according to the current best estimate of their distance from the source routing group. It adds the closest routing group to the set of routing groups for which the shortest paths have been determined. Because routing groups B and C have the same distance value, the routing engine selects one routing group at random. This example assumes that the routing engine selects routing group B.

  4. The routing engine calculates the distance value of all remaining routing groups adjacent to routing group B, by combining the cost value of the connector between routing group B and the adjacent routing group with the distance value of routing group B. It updates the distance value of an adjacent routing group only if the calculated distance value is smaller than the value that is already assigned to the routing group, and only then updates the predecessor (indicated by black arrows).

    The neighbors of routing group B are routing groups C, D, and E. The current distance value of routing groups C and D is not defined. Therefore, their distance value is updated with the cost values of their connectors, plus the distance value of routing group B (1+1). Then the predecessor (indicated by the source of the black arrows) for all these routing groups is updated. The predecessor is routing group B.

    Routing group C is not updated, because the sum of the distance value of routing group C and the connector cost (1+1) is larger than the current distance value of routing group C.

  5. The routing engine sorts the remaining routing groups according to the current best estimate of their distance from the source routing group and adds the closest routing group to the set of routing groups for which the shortest paths have been determined. The algorithm now picks routing group C, because this routing group has the smallest distance value.

  6. The routing engine calculates the distance value of all remaining routing groups adjacent to routing group C, by combining the cost value of the connector between routing group C and the adjacent routing groups with the distance value of routing group C. It updates the distance value of an adjacent routing group only if the calculated distance value is smaller than the value that is already assigned to the routing group, and only then updates the predecessor (indicated by black arrows).

    The remaining routing groups that are neighbors of routing group C are routing groups D and E (routing groups A and B were already processed).

    The current distance value of routing groups D and E is 2. This value is smaller than the sum of the connector cost and distance value of routing group C (1+2). Therefore, the distance value and predecessor list of routing groups D and E are not updated.

  7. The routing engine sorts the remaining routing groups (routing groups D and E) according to the current best estimate of their distance from the source routing group and adds the closest routing group to the set of routing groups for which the shortest paths have been determined.

    Because routing groups D and E have the same distance value, the routing engine selects one routing group at random. This example assumes that the routing engine chooses routing group D.

    The only remaining neighbor is routing group E, which has a current distance value of 2. This value is smaller than the sum of the connector cost and distance value of routing group D (1+2). Therefore, the distance value and predecessor list of routing group E are not updated.

    The last routing group that has not been added to the list of routing groups for which the shortest paths have been determined is routing group E. There are no remaining adjacent routing groups. Therefore, the calculation of the shortest path is complete. The shortest paths from routing group A to any other routing group in the topology have been determined.

Message Transfer Load Balancing

If multiple paths with the same cost value exist, the routing engine selects a transfer path at random, as outlined in the previous steps. However, the routing engine does not perform load balancing. As explained earlier, the routing engine caches the message type information that refers to the shortest path a message can take to its destination. Therefore, all messages of the same type travel the same path, even if another path with the same cost value exists (for example, "routing group A > routing group B > routing group E" and "routing group A > routing group C > routing group E").

Load Balancing between Routing Groups

True load-balancing between routing groups can be achieved only by using one Routing Group Connector with multiple bridgehead servers.

The following table lists the load-balancing configurations that you can use between routing groups.

Possible configurations between routing groups

Possible configuration Comments

A single routing group connector with multiple source or multiple destination bridgehead servers, or both.

With these types of connectors, the routing engine returns the connector GUID in the next hop information for the advanced queuing engine. The advanced queuing engine then randomly selects the bridgehead server that must be used, thereby load-balancing the message transfer across all bridgehead servers.

If a message reaches a source bridgehead server of a routing group connector with multiple source bridgehead servers, the message is not rerouted to any other source bridgehead server. After the message reaches the routing group connector, message transfer to the destination routing group is direct. Therefore, users who have mailboxes on the bridgehead server always use the local server for message transfer to the destination routing group.

Note

It is best to specify multiple source and destination bridgeheads for a single routing group connector between two routing groups. This practice improves load-balancing and redundancy.

Multiple connectors with the same address space (or connected routing group), same weight (cost), and each with a single source and destination bridgehead server

In this type of configuration, true load balancing is not achieved. Load balancing is performed only to the extent of selecting a connector initially for a given message type. The routing engine determines the message type one time, caches this information, and then routes all messages of the same type over the same connector. The second connector is used only if the first connector fails. However, a second server might select the second connector and in this way balance the load to some extent.

Note

It is not a good practice to use multiple connector instances between routing groups for load balancing, because true load balancing cannot be achieved.

Multiple connectors with the same address space (or connected routing group), different weights (cost), and each with a single source and destination bridgehead server

If you want to configure connectors to fail over automatically, you can create two separate connectors on different bridgehead servers, each with a different cost. Link state for a connector is determined by its local bridgehead server. If the bridgehead server on the preferred connector with the lowest cost is unavailable, the connector is considered to be unavailable and the routing automatically chooses the second connector. When the bridgehead server that is hosting the connector with the lowest cost becomes available, Exchange servers then begin using it again.

Load Balancing between Connectors and External Systems

Depending on the scenario, there are a few ways to achieve load balancing across SMTP connectors.

  • If you want to load-balance outbound requests across multiple servers in the sending organization, configure multiple source bridgeheads.

  • If you want to load-balance traffic across multiple destination servers, either have the destination administrator configure DNS correctly (using a suitable configuration of MX and A records), or specify multiple smart host addresses on the connector.

Or, if you want to ensure failover resiliency, create multiple SMTP connectors scoped with the same address space, different costs, and different source bridgeheads. If the bridgehead server on the preferred connector with the lowest cost is unavailable, the connector is considered unavailable and routing automatically chooses the second connector. If you use two connectors with the same cost, Exchange servers randomly select which bridgehead server and connector to use. Then, if this bridgehead server becomes unavailable, they will fail over to the second connector. However, when the first bridgehead server becomes available, the servers will not fail back to this server because the route has the same cost as the server that they are already using.

The connector configuration in the following figure, for example, is not load-balanced for failover configuration because the address spaces do not match. Messages sent to external users in a .NET domain always travel over the SMTP connector with the .NET address space. This is because the routing engine chooses the most detailed address before evaluating costs.

A connector configuration that does not provide load balancing or fault tolerance

7de23cb6-9bd9-43b1-b725-90ce114b2c4e

Note

If restrictions exist on the connector with the *.NET address space, and the restrictions prevent certain messages from crossing this connector (for example, because the sender is denied message transfer over this connector), the routing engine returns the message to the sender with an NDR. The routing engine does not fall back to the second connector for those messages. The most detailed address space determines which connectors can be used to transfer a message. Connectors with less detailed address spaces are excluded from the route calculation.

If a connector cannot transfer messages, the advanced queuing engine notifies the routing engine of a link failure. This might cause the routing engine to mark the connector as down, in which case all queued messages are rerouted.

The routing engine considers a connector as down if one of the following conditions is true:

  • The routing engine cannot establish a connection to at least one of the connector's source bridgehead servers, and there is no TCP/IP connection to port 691 between the routing group master and the source bridgehead servers. Unavailable source bridgehead servers are marked as VS_NOT_STARTED in the link state table.

  • None of the source bridgehead servers can transfer the message to a destination bridgehead server successfully. Source bridgehead servers that cannot transfer messages to the destination are marked as CONN_NOT_AVAIL.

Note

If you use X.400 connectors, and the connector cannot transfer messages, the Exchange MTA informs the routing engine that a link failure occurred. The state of the source bridgehead server is then CONN_NOT_AVAIL. X.400 connectors can have only one source bridgehead server.

Message Rerouting

To guarantee efficient message transfer, the routing engine informs the advanced queuing engine and Exchange MTA immediately of any link state changes. To avoid sending messages along broken paths, all queued messages must be routed again. This process is named rerouting. In rerouting, the advanced queuing engine discards all cached next hop information, because this information is no longer valid. Each message that is currently waiting to be transferred is passed to the routing engine again, to recalculate the next hop. This can be a resource-intensive task.

The following figure shows a rerouting example in which the bridgehead server in routing group E is down. No messages can reach this routing group currently. When the routing engine recalculates the shortest paths for messages to recipients in routing group E, it discovers that no path is available. Connectors marked as down are excluded from the routing process. Therefore, routing group E is currently isolated.

Broken routing group connectors

3e7b77fb-c3c7-47c4-8961-8a3ea10d7581

Because no valid path exists, the routing engine cannot determine a valid next hop for messages that are waiting to be transferred to routing group E. The routing engine informs the advanced queuing engine, in the next hop type information, that the next hop is currently unreachable. The advanced queuing engine must retain the message until at least one transfer path becomes available, or until the message expires and is returned to the sender with an NDR.

Note

If only one connector to a routing group exists, and there are no alternative paths, the link state is always marked as available to reduce the number of link state changes in the routing topology. Exchange Server 2003 queues the messages and sends them when the route becomes available again.

Rerouting and Address Spaces

As with load-balancing, Exchange Server 2003 reroutes messages only over connectors that have the same address space. For example, you can create two separate connectors on different bridgehead servers, each with the same address space but different costs. If the preferred connector becomes unavailable, the routing engine automatically selects the second connector, until the primary connector becomes available again.

Note

The routing engine does not reroute messages from a connector with a specific address space to a connector with a less specific address space, because the routing engine considers this a different destination. The messages remain on the source bridgehead server until the connector with the detailed address space becomes available.

If there are restrictions on the connector with the .NET address space, and the restrictions prevent certain messages from crossing this connector, for example because the sender is denied message transfer over this connector, the routing engine returns the message to the sender with an NDR. The routing engine does not fall back to the second connector for those messages. The most detailed address space determines which connectors can be used to transfer a message. Connectors with less detailed address space are excluded from the route calculation.

Connector Recovery

The routing engine determines that a connector is available again in one of the following ways:

  • VS_NOT_STARTED   The routing group master establishes a connection to TCP port 691 on the source bridgehead server. The source bridgehead server is marked as CONN_AVAIL, and because at least one source bridgehead server is available for the connector, the connector state switches to STATE UP.

  • CONN_NOT_AVAIL   For unavailable connectors, the source bridgehead servers continue to retry connection at 60-second intervals, even if no messages are waiting for transfer. When a connection is established, the advanced queuing engine or the Exchange MTA reports to the routing engine an outbound connection success from the source bridgehead server. The routing engine then switches the source bridgehead server to CONN_AVAIL and the connector to STATE UP.

Rerouting and Activation Schedules

All connector types let you configure a schedule for the connector so that you can transfer e-mail messages at specific times. Connectors can be configured to be always active, to become active only at specified times, or to be never active, in which case the connector does not transfer messages until the connector schedule is changed again. You can also configure a connector as remote initiated, which means that the connector does not initiate a connection itself. Instead, it waits for a remote server to connect and trigger the message transfer.

The connector schedule affects the message transfer only. It does not affect message routing. The routing engine considers connectors with any schedule type as available if they are STATE UP. Therefore, messages might even be routed to connectors for which the activation schedule is set to never. Link state changes and rerouting do not occur for these connectors. Messages wait in the connector's queue until the activation schedule is changed. The same is true for remote initiated connectors. Messages are not rerouted while they are waiting for their retrieval.

Tip

If you want to avoid message routing to a connector, set its maximum message size to 1 kilobyte (KB).