Chapter 8 - The Microsoft QoS Components

On This Page

The Host Protocol Stack
The Subnet Bandwidth Manager and Admission Control Service
How Hosts Mark and Shape Traffic Based on Network Policy

In this section, we'll describe the QoS components provided in the Microsoft Windows family of operating systems and how they are used to implement the mechanisms described above. Windows 98 contains only user-level components, including:

  • The application component described in Section 8.1.1.

  • The Winsock2 and GQoS APIs described in Section 8.1.2.

  • The QoS service provider described in Section 8.1.3.

The Windows 2000 operating system contains all of the above as well as all the other components described in this section.

There are two primary groups of QoS components - those that reside in the host protocol stack and those that comprise the SBM and Admission Control Service (ACS).

The Host Protocol Stack

The following diagram illustrates the host protocol stack:

Bb742487.qosover4_sm(en-us,TechNet.10).gif

In the following paragraphs, we'll describe each QoS related component.

Application

Applications reside at the top of the stack. These may or may not be QoS-aware and may require varying qualities of guarantees. We recommend that applications that are session oriented and that can benefit from QoS, use the Generic QoS (GQoS) API. This is especially important for applications requiring high quality guarantees. Applications that are QoS aware invoke the services of the underlying QoS service provider (QoS SP) via the GQoS API. It is strongly recommend that ISVs implement the minor changes required to add GQoS support to Winsock2 applications. It is also expected that network administrators will require multimedia applications to conform to the GQoS API specification in order for them to be broadly deployable without abusing network resources. Mission critical, non-multimedia applications, such as client/server database applications, will have to conform to the GQoS API in order to enable network administrators to prioritize these applications on corporate networks.

Certain management utilities may be used to invoke QoS on behalf of applications that are not QoS-aware. These work via the traffic control API (TC API). Applications that are not QoS-aware will not be able to receive the quality of guarantees that would otherwise be achievable unless the underlying network is accordingly over-provisioned.

The following applications are currently enabled to use the GQoS API:

  • NetMeeting conferencing software (Windows 98 and Windows 2000)

  • TAPI 3.0 (Windows 2000)

Following the release of Windows 2000:

  • Windows Media Technologies

  • A major enterprise resource planning (ERP) application

  • Other multimedia and non multimedia applications, to be announced

Winsock2 & GQoS API

The Winsock2 API is a common API for use by network applications. Several Winsock commands carry QoS parameters and can be used to invoke QoS services from the operating system. These commands comprise a subset of the Winsock2 API, known as the GQoS (generic QoS) API. The purpose of this API is to enable applications to invoke the QoS they need with little understanding of the QoS mechanisms available or the specific underlying network medium. The API is very abstract and requires only very simple directives from the application. For applications that are QoS savvy and that do want additional control over the underlying mechanisms, extensions to the API provide additional control.

In the spirit of simplifying the interface presented to the application programmer, the GQoS API does not expose RSVP, diffserv, 802.1p or any other protocol or media-specific QoS mechanism to the application programmer. Instead, the sending application programmer specifies one of the following services:

  • Guaranteed (generally specified for low and bounded latency applications, such as interactive voice)

  • Controlled Load (generally specified for applications that are somewhat jitter tolerant but require the appearance of a lightly loaded network with a specific capacity, for example, streaming video)

  • Qualitative (specified for applications that require better than best-effort service but are unable to quantify their requirements)

In addition to specifying a service, the sending application is expected to provide an indication of its average sending rate. It is recommended that applications also include an application ID and sub application ID (corresponding to the specific application sub-flow, such as print flow vs. time-critical database transaction). The application IDs are especially important for applications invoking the Qualitative service, as these provide no quantitative criteria by which to evaluate the application's impact on the network. Receiving applications must, at a minimum, indicate to the GQoS API that they are interested in network QoS. Certain qualitative applications may be allotted network QoS in response to the sender's use of GQoS, with no requirement for the receiver to invoke the GQoS API.

The underlying QoS service provider coordinates the various QoS mechanisms in the network in response to the application's request. These mechanisms include RSVP signaling and traffic scheduling, as well as DSCP marking, 802.1p tagging1 that is based on the results of signaling.

The QoS Service Provider

The QoS service provider (QoS SP) is the entity that responds to the GQoS API. It provides the following services:

  • RSVP signaling

  • QoS policy support

  • Invocation of traffic control

RSVP Signaling
RSVP signaling is generated by default on behalf of applications using the GQoS API. The QoS SP initiates and terminates all RSVP signaling on behalf of the applications. It provides status regarding reservation state to applications that are interested, but does not require the application to understand RSVP signaling.

SBM Client Functionality
The QoS SP provides full SBM client functionality. This means that it detects the presence of a DSBM on a shared subnet and routes signaling requests via the DSBM (as opposed to the next layer-3 hop). In addition, the QoS SP presents the results of the DSBM's advertised NonResvSendLimit (see section 6.3.7) to applications via the GQoS API. This enables GQoS compliant applications to avoid or restrict their sending in response to administrator policies.

QoS Policy Support
In support of QoS policy, the QoS SP inserts a Kerberos encrypted Windows NT user ID into RSVP signaling messages, both on sender and receiver. In addition, the QoS SP inserts any application identification provided by the application via the GQoS API. The inserted objects identify the Microsoft Windows NT operating system user and application such that application and/or user-specific policy can be applied in the network.

Invocation of Traffic Control
The QoS SP actually enforces policy by invoking traffic control in accordance with the network's response to signaling messages. In general, the QoS SP identifies two types of traffic control: greedy traffic control and non-greedy traffic control. Non-greedy traffic control is invoked in immediate response to an application's request for QoS. Greedy traffic control is enabled only if (and to the degree) approved by the network, in response to the RSVP signaling.

The TC API is quite complex and provides a high degree of control. The QoS SP abstracts the complexity of the TC API via the GQoS API such that applications can remain relatively simple.

The Traffic Control API

The TC API provides the QoS SP and third party traffic management applications with a high degree of control over traffic control (TC) functionality in the kernel. The fundamental APIs that comprise the TC API are CreateFlow and CreateFilter. CreateFlow causes a flow to be created in the kernel network stack. The flow has certain actions and characteristics associated with it. These include marking behavior (DSCP, 802.1p, and other media-specific marks or tags), packet scheduling behavior and other media-specific behavior, as appropriate. CreateFilter is called to attach a filter to a flow. A filter specifies classification criteria, which determine the set of packets that will be directed to the associated flow. Multiple filters may be attached to a single flow. Filters may be fully specific (no wildcards) or may include wildcards. The generic packet classifier (GPC) is used for the purpose of packet classification. Scheduling parameters are expressed using the common token-bucket model. Filters are expressed in the form of an IP 5-tuple and a mask.

Note that, at present, the TC API and the corresponding functionality are applicable to transmitted traffic only. In future versions of Windows operating systems, TC functionality will be available to control the treatment of received traffic as well as transmitted traffic. Traffic control functionality is available in Windows 2000, but not in Windows 98 (with the exception of limited DSCP marking).

The TC API separates traffic control consumers from traffic control providers. In the illustration above, the QoS service provider is a traffic control consumer, while the packet scheduler and ATM network card are traffic control providers.

Traffic Control Providers
Traffic control providers include all modules that implement any traffic control functionality in response to the traffic control API. Traffic control functionality available in Windows 2000 includes:

  • Packet scheduling

  • 802.1p marking

  • DSCP marking

  • ISSLOW link layer fragmentation (per PPP multilink) for latency reduction on slow links

  • ATM VC control and cell scheduling

The packet scheduler component is implemented as an intermediate driver. It provides traffic control functionality over standard LAN adapters, as well as over NDISWAN and WAN drivers. Since ATM LANE presents an Ethernet interface to the network stack, the packet scheduler also provides traffic control over LANE. On the other hand, classical IP over ATM (CLIP) provides traffic control functionality directly to the traffic control API without requiring the packet scheduler. Additional traffic control providers planned for the future include cable modem drivers, P1394 drivers, and other media-specific drivers.

Packet Scheduler

The packet scheduler is used to provide traffic control over drivers and network cards that have no inherent packet scheduling capability. It schedules packets on separate QoS queues as created via the TC API. It also is responsible for effecting the marking of DSCPs and media-specific priority tags (such as 802.1p) on transmitted packets.

Scheduling
The scheduling components of the packet scheduler include:

  • A conformance analyzer, which checks packets for conformance to a traffic descriptor

  • A shaper, which delays packets until they can be legitimately transmitted per the traffic descriptor (non-work-conserving queuing)

  • A sequencer, which determines the sequence in which packets from different flows may access the link when it is congested

Flows may be individually configured in the packet scheduler for variations of the following modes:

  • Borrow mode - allows traffic on the flow to borrow resources from higher priority flows that are temporarily idle (at the expense of being marked non-conforming and demoted in priority)

  • Shape mode - delays packets submitted for transmission until they conform to a specified traffic descriptor (non-work conserving)

  • Discard mode - discards packets that do not conform to a specified traffic control descriptor.

By default, the packet scheduler implements a mapping from requested service type to one of these modes, and to an internal priority level, as follows:

Service Type

Mode

Priority

Network Control

Borrow mode

Highest priority

Guaranteed Service

Shape mode

High priority

Controlled Load

Borrow mode

Medium priority

Qualitative

Borrow mode

Low priority

All other traffic

Borrow mode

Lowest priority2

1 This service type may be requested via the traffic control API, but is not available via the GQoS API. It is reserved for use by critical traffic management applications.
2 Note that packets deemed non-conforming to the traffic descriptor are demoted in priority to a level lower than that of best-effort traffic. This demotion may be reflected in internal sequencing as well as marking and tagging.

These defaults may be overridden as appropriate.

The packet scheduler provides the flexibility to invoke a broad range of traffic control functionality, including both work -conserving and non-work-conserving schemes, the ability to proportionately share link resources (such as in weighted fair queuing) and so forth. It is possible to simultaneously configure different flows for different modes.

Marking
The scheduling components of the packet scheduler include:

In addition to scheduling, the packet scheduler effects the marking of transmitted packets. The reason that this functionality is mediated via the packet scheduler is to enable it to demote non-conforming packets. By default, packets are marked based on a mapping from the service type associated with a flow, according to the following mapping:

Service Type

DSCP

802.1p

Network Control

30 (6)

7

Guaranteed Service

28 (5)

5

Controlled Load

18 (3)

3

Qualitative

0 (0)

0

All other traffic

0 (0)

0

Note: The actual DSCP is a six-bit field carrying the value indicated. Three of the six bits comprise a subset of the DSCP field, formerly referred to as the IP Precedence field. The equivalent IP precedence values are shown in parentheses.

There are several cases in which the default mapping may be overridden. These are described below. (Note that this describes marking behavior in response to the TC API, which bypasses the policy mechanisms of the QoS SP and the network. Consequently, this behavior does not fully describe marking in response to the GQoS API and network policy, which is mediated by the QoS SP. For information regarding marking in response to the GQoS API, see section 8.3.

  • Non conformance - packets that are deemed by the packet scheduler to be non-conforming to the traffic descriptor provided, may be marked with a mark other than the default mapped from the service type. Typically, the mark will indicate a lower priority than that which would be applied to conforming packets.

  • Registry override - it is possible to define new static mappings in the registry. These can be defined on a per-interface basis. Mappings can be defined both for conforming and non-conforming packets.

  • TCLASS and DCLASS - these objects can be supplied with the CreateFlow API (or the related ModifyFlow API) at any time, to dynamically override the 802.1p or DSCP marking, respectively, for the flow. These objects are not directly accessible to applications using the GQoS API. Rather, the network is expected to signal these to the QoS SP, which in turn provides them to traffic control via the TC API.

Note that the packet scheduler marks neither DSCP nor 802.1p directly. Rather, it effects this marking. In the case of the DSCP, the marking is still performed by the core operating system. However, in the case of 802.1p, the marking is actually performed by the network card driver (or hardware) which generates the packets. The packet scheduler provides the network card driver a suggested 802.1p value with each packet. Ethernet drivers may use the suggested value directly. Other media drivers interpret the suggested value and map it to their media-specific link layer tagging or marking mechanism, as appropriate.

The Subnet Bandwidth Manager and Admission Control Service

The SBM and the ACS are Microsoft's QoS policy components.

The Subnet Bandwidth Manager

The SBM protocol defined by the IETF extends RSVP to be useful in a shared media subnetwork. In shared media subnets, there is no single agent accountable for the shared resources. The SBM protocol defines how agents in the subnetwork elect a Designated SBM (or DSBM). The DSBM then advertises its existence on the shared subnet and is accountable for the shared resources of the subnet. Devices sending RSVP PATH messages onto a shared subnet are required to detect the presence of the DSBM and to route their messages through the DSBM, instead of directly to the next layer 3 hop. The DSBM is then able to apply admission control based on the resources (and policies) of the shared subnet, before relaying the RSVP message to the next layer 3 hop.

Microsoft's ACS is a service that combines the resource-based admission control functionality of an SBM with policy based admission control using the Active Directory. The ACS leverages the fact that the SBM (by advertisement of its presence on a shared subnet) is able to insert itself into the RSVP reservation path and can, therefore, effect admission control. To use the ACS to apply policy-based admission control, it is necessary to enable the ACS on a Windows 2000 server. The SBM component of the ACS will then run for election with other DSBM capable devices on the same shared subnet. If the ACS is to be used for admission control on the subnet, it may be necessary to disable DSBM functionality on other devices (switches and routers) on the subnet.

The Local Policy Module and Extensibility
When PATH or RESV messages are intercepted by the SBM, they are handed off for policy processing by a Local Policy Module (LPM). Microsoft's LPM simply extracts the policy-related objects from the RSVP message, applies the appropriate Kerberos processing to the user ID, and compares the requesting user ID, and the resources requested, against privileges configured in the Active Directory. Based on the results of the comparison, the RSVP request is either admitted or rejected by the ACS. The interface between the SBM and the LPM is an open interface—the LPM API. Third party ISVs may use this interface to install alternate policy modules in the ACS. These policy modules may use intermediate third party policy servers rather than accessing the active directory directly. They may also be used to provide special resource-based admission control such as might be required in the case of cable modem head-ends. Multiple policy modules can be cascaded in series in a single ACS server.

The extensibility described in the previous paragraph enables third parties to use the ACS to apply policies against their policy servers. In this model, the ACS is acting as a policy enforcement point (PEP)3. As discussed below, it is usually preferable to use a router as a PEP. Standard routers, acting as PEPs, use the COPS protocol to outsource policy decisions to a policy server, which in turn uses the Active Directory as the policy data store. In this environment, additional extensibility is provided by allowing Microsoft's LPM to be run on third party policy servers. This mode of operation enables the policy server to readily parse Microsoft's Active Directory resident QoS schema.

Applicability of the ACS

The functionality of the ACS is nothing more than standard PEP/PDP functionality. In the near future, routers and switches will provide this functionality, since these are the actual policy enforcement points and these are the devices through which reservation messages naturally flow. In the interim, Microsoft's host-based implementation of the ACS enables early adopters of QoS technology to effect policy-based admission control.

It is a common misconception that it is necessary to install an ACS on every subnet in order to benefit from QoS policy. This is not the case. An ACS enables significant control over network resources when installed even on a small number of carefully selected subnetworks. It is true that the SBM-based ACS functionality imposes awkward topological constraints in certain conditions. In particular, when it is necessary to apply policies that are specific to a point to point link (such as a WAN link), the SBM-based ACS cannot be readily used. In these circumstances the routing and remote access service (RRAS)-based ACS should be used. The RRAS-based ACS provides point to point routing with ACS policy control and can be used, for example to drive WAN links.

Variations of the ACS

The following diagram illustrates variations of the SBM/ACS described previously.

Bb742487.qosover5_sm(en-us,TechNet.10).gif

The leftmost example illustrates a Windows 2000-based host, on which the ACS service is enabled. The ACS uses the standard Microsoft LPM that uses LDAP to directly access the active directory.

The center example shows the same SBM platform, however, the Microsoft LPM has been replaced with a third party COPS LPM. The COPS LPM accesses an intermediate policy server using COPS. The policy server retrieves policy data from the active directory, using LDAP. This configuration allows policy decisions to be offloaded from the ACS/SBM platform. The benefits of doing this are twofold: first, the network device that intercepts the QoS control messages can be a very lightweight device (since the policy decision work is done elsewhere). Second, a distributed set of policy servers can make distributed policy decisions.

Finally, the rightmost example illustrates an industry standard router. The router uses COPS to offload the policy decision to a third party policy server. The third party policy server may use Microsoft's LPM to parse Active Directory QoS schema.

How Hosts Mark and Shape Traffic Based on Network Policy

In the previous section, we discussed the use of the TC API to mark and shape traffic. Since marked packets may obtain resources in the network that would otherwise be available to other packets, we consider marking to be greedy behavior. As such, marking should be subjected to policy controls. Shaping, by comparison, can only reduce the rate of transmitted traffic (no amount of shaping can make a 10 Mbps Ethernet interface transmit faster than 10 Mbps). Therefore, shaping is considered non-greedy behavior and need not be subjected to policy controls.

In order to assure that packet marking is subjected to policy control, the TC API is made available only to administrative authorities (it can be invoked only by applications having administrator privileges for the operating system). These include the QoS SP and, possibly, additional network management applications. Non-administrative applications are unable to directly effect packet marking. Instead, these ask the QoS SP for a particular service level (and, in the case of quantitative applications, for a specific quantity of resources at this service level). The behavior of the QoS SP in response to application requests is described in the following paragraphs.

In general, the QoS SP applies non-greedy traffic control (requested shaping behavior) on behalf of the application as soon as the application requests QoS. At the same time, the QoS SP begins RSVP signaling to the network. Network devices along the data path review these signaling requests both as the PATH message flows downstream and as the RESV message flows back upstream. PEPs are able to assess the impact of the resource request on their available resources. They use PDPs to subject the request to verification against installed policies. When verifying admissibility, PEPs that use aggregate traffic handling assume default mapping from the requested intserv service level to an aggregate service provided by the device. Alternatively, PEPs and PDPs may work together to dictate an alternate mapping by returning to the host a DCLASS or TCLASS object (to effect marking of the DSCP or 802.1p tag for packets transmitted on the corresponding flow). Any PEP along the path may veto the reservation request due to insufficient resources or restrictive policies. A veto has the effect of refusing admission control to the requesting hosts and preventing the transmitting host from marking packets.

In the case that the RESV arrives at the transmitting host, the resource request has successfully transited all admission control agents in the network and may be considered admitted. Admission of a request permits the QoS SP on the transmitting host to invoke greedy traffic control, marking packets based on a default mapping, or according to a returned DCLASS or TCLASS object. As a result, packets are marked for priority only while the network approves the transmitting host’s resource request. Until the request is admitted (or at any time that the request is rejected or revoked), the QoS SP will not mark packets for better than best-effort behavior. The default mappings used by the host are as indicated in the tables in section 8.1.5.2. Note that for qualitative service, the default marks are equivalent to best effort. In order to cause traffic on qualitative flows to be marked for anything other than best-effort, it is necessary for a PEP to return a DCLASS or TCLASS object to the transmitting host.

Coordination of Greedy Behavior not Subjected to Policy

The QoS SP does not signal to the network for applications that do not generate persistent traffic. If it is necessary to mark traffic generated by these applications, this must be done either by network management applications making direct use of the traffic control API, or by the network itself. Persistent applications (that mark in response to signaling and policy) share the same resources as non-persistent applications or other applications that do not signal. Therefore, network management applications that effect the marking of traffic on behalf of non-signaling applications must be sure to reconcile the resources used by these applications against the resources used by signaling applications. The network administrator must enforce static limits on the type and quantity of resources available through signaled policy and those claimed by marking without signaling and policy, or must dynamically manage admission control to both pools of resources simultaneously. This requirement is described in section 4.4.

1 We will use the term marking to refer to both DSCP marking and 802.1p tagging.
2 Note that the DSBM election protocol defines a prioritization by device type. Highest priority is given to switches, with routers next and hosts last. This order favors devices that optimize the quality/efficiency product of the shared subnet. For optimal quality/efficiency product, a layer 2 subnet should be constructed entirely of DSBM capable switches with dedicated ports (no dumb hubs or yellow wires). In this case, the DSBM election protocol will divide the shared subnet into a set of managed segments, each controlled by a DSBM. The layer 2 network, from an RSVP perspective, will appear to have a routed topology. By comparison, if a host or router at the edge of the layer 2 subnet is the DSBM, it makes admission control decisions without detailed knowledge regarding the internal topology of the subnet. Therefore, a host or router DSBM reduces the quality/efficiency product of the subnet. On shared subnets, which are usually over-provisioned, the increased quality/efficiency product rarely justifies the increased overhead that results from every switch acting as a DSBM.
3 The ACS is a policy enforcement point in the sense that it is able to veto signaled admission control requests. Unlike router-based PEPs, it is not strictly speaking, the final enforcer. Ultimate enforcement is the ability to forward packets or to not forward packets, which is reserved for devices that are actually in the data path. Nonetheless, by blocking admission control, the ACS is able to prevent the allotment of high priority resources to traffic on signaled flows.