Network Analysis and Optimization Techniques

Article
11/12/2007

By Daniel J. Nassar

Chapter 5 from Network Performance Baselining, published by New Riders Publishing

When performing a network baseline study, specific techniques enable an analyst to troubleshoot network issues. Some of these techniques involve processes discussed earlier in this book, such as utilization and quantitative measurement analysis. Unique methods exist to isolate specific traffic flow events, which can be very helpful during isolation and statistical baselining.

First, an analyst must engage the standard methodology as presented in the preceding chapter. Next, the analyst should apply techniques that provide a more focused review of dataflow from each specific baseline session. Some of these techniques can be applied to cause isolation; others can be used to optimize a network's performance (see Figure 5.1).

Figure 5.1: The main network optimization techniques.

Figure 5.1: The main network optimization techniques.
See full-sized image.

Among the techniques involved are the following:

Physical health analysis
Broadcast storm analysis
Network capacity overload analysis
Network throughput analysis
Network end-to-end interpacket timing analysis
Transport and file retransmission analysis
Packet route and path cost analysis
End-to-end file transfer analysis
Drill-down data-decoding steps and specific techniques

The techniques listed above will yield a palette of data information that will be extremely valuable in problematic and network baseline data profiling situations.

When reviewing a data trace during a network baseline session, an analyzer's Expert system output information or the internal indications in the data trace may immediately point to a specific type of problem, such as excessive physical errors. Rather than quickly taking the information from the Expert system and immediately attempting to troubleshoot the specific area of the network flagged as troublesome, the analyst should also further examine the internal data-trace results. It is highly likely that the data-trace internal view holds additional information that should also be reviewed and cross-mapped to a higher-level report information-extraction engine or Expert system screen results. Further examination of the data trace will most probably result in a more exact cause analysis mapping of the problem, yielding a more exact technical synopsis and applied recommendation.

This chapter describes each analysis technique. These techniques should be applied when performing a network baseline study. This is important so that you can isolate issues to their exact cause and profile data for optimization reasons.

Physical Health Analysis

Taking into account the many physical topologies in the LAN and WAN environments, it is natural to assume that many different physical error types may be encountered. When performing physical health analysis in a network baseline session, it is important to quickly note all information gathered in an Expert system or management system platform. Information related to error type, time of the error event, the associated addresses or network segment involved with the area in question, and protocol event sequence involved with the error must be clearly documented. The internal data-trace analysis results should then be cross-mapped to the output report of the error.

Many different host systems and internetwork device platforms (such as routers and switches) have their own internal method of reporting errors through various management platforms. Many different protocol analysis and management tools report error-count levels and time-of-occurrence conditions. All these systems yield valuable Error log or error-report information, which is a primary focus area when reviewing data. The Error log information must always be carefully gathered and documented.

For the purposes of illustrating error-report gathering, the following discussion describes how to apply this approach using a protocol analyzer tool.

Using a Protocol Analyzer for Error-Report Gathering

A protocol analyzer used during a network baseline session enables an analyst to quickly assess the types of errors encountered, the error count, and the time of occurrence. This relevant information proves extremely valuable because it assists in identifying the error event and the possible impact of the error on a network. In a reactive analysis session, this information is crucial and relevant to a rapid troubleshooting process. In a proactive network baselining session, this information is also important; however, the information can be quickly noted and documented for later review through the baseline process. This is especially important during a large internetwork study involving many different network areas, because certain physical errors may repeat and show a pattern of events that might be related to a center networking device, such as a main computer room hub, switch, or router. Within the protocol analysis capture, the error event is contained within a frame or a packet sequence, depending on the LAN or WAN topology.

It is important to cross-map the statistical Error log or output report from a specific management system (if one is present). It is important to map the Expert results with the internals of the protocol analysis dataflow events. The protocol analyzer error-reporting screens should be carefully examined for the key error event. After the error event information has been noted, the protocol analyzer should be paused or stopped. The capture should then be immediately saved to disk to ensure that the error frame or error frame occurrence is stored properly within the protocol analyzer platform. The data trace should then be opened and reviewed carefully, following a "page through the trace" process. The process of paging through the trace just involves slowly moving through the internal data gathered by the protocol analyzer to locate any unique anomalies. Some protocol analyzer Expert-based systems have a hotkey filtering system to quickly filter to the error event inside the data-trace results.

A protocol analysis trace taken during a network baseline session, may contain a large number of packets and frames. Some data traces could contain 50,000 or 100,000 frames or even more within one single capture. It can be quite cumbersome to page through the complete data trace after the trace has been opened for review. An Expert system feature provides a hotkey filtering system can facilitate an immediate extraction filter based on the error occurrence in the particular analyzer Expert system screen. The approach in this case is to highlight the error on the analyzer Expert system or statistical monitoring screen. After the error has been highlighted, a an analyst can use this feature to quickly filter to the area within the set of packets within the overall data trace to the exact area of the error occurrence. After the error event has been found, other relevant information in packets surrounding the error occurrence frame may also be identified; these may point to the actual cause of the error.

An inexperienced analyst may too quickly map the cause of a problem to an Expert system error-output report or a management system Error log. By using the actual data-trace error event and packet-sequence review process to access packet data around the error, it is possible to be more defined and accurate as to the cause analysis. In summary, a thorough review of the error frames within the trace may uncover packets or frames surrounding the error occurrence that may pinpoint the cause of a problem.

Consider, for example, 16Mbps Token Ring network that is operating abnormally from an upper-layer application standpoint. Users are complaining significantly about performance. In this case, the analyst should immediately deploy a protocol analyzer, because this is a rapid baseline situation (or a reactive analysis event).

As stated earlier, certain quantitative baseline measurements must be taken prior to topology error analysis, such as utilization, protocol percentages, and other statistical measurements. If the analyst moves through the initial steps in the quantitative baseline measurement process and notices a high number of error reports from the protocol analyzer indicating a high ring purge error MAC occurrence, this is a relevant event.

Assume, for example, that through an analysis session a high number of ring purge MAC frames are found within a Token Ring environment. The protocol analyzer could then just stop the capture, save the information, and filter on the ring purge events via a hotkey filtering system. The analyst could identify the ring purge MAC frames within the Token Ring trace analysis session. If, prior to the ring purge MAC frames, it is noted that excessive ring insertion failures are associated with a specific device, or excessive Soft Error Report MAC frames, this might indicate the cause of the ring purge error noted in the Expert system or Error log. Chapter 9, "Token Ring and Switched Environments," discusses Token Ring issues in more detail. This is just one example of how internal data-trace analysis, as associated with Expert system mapping, facilitates a cross-review process that yields a more accurate analysis.

Another illustration is an Ethernet internetwork that is showing a high number of corrupted CRC frames within the analyzer Expert system analysis screen. If the protocol analyzer filters on the artificial intelligent Expert screen displaying the CRC corrupt Ethernet errors, the analyst should then move directly to the internal area of the trace that shows the CRC-corrupted error frames involved. By doing so, the analyst can determine that prior to the CRC frames, and possibly after the frames, certain frames indicate a high number of communication events on the Ethernet medium. Because the Ethernet medium engages a carrier sense multiple access/collision detection (CSMA/CD) sequence that is an ongoing process and is part of the Ethernet architecture, the cause analysis can be somewhat complex. Certain Ethernet frames, when including errors such as a CRC type, may be shorter than normal and may have physical addresses that cannot be interpreted. Because of this, sometimes the source and destination addresses may not be able to be read related to the CRC error cause. If prior to the CRC error frames, the trace shows that a certain set of devices are communicating, it is quite possible (based on the operation of CSMA/CD within Ethernet) that these devices are involved in conversations when a high number of CRC errors are occurring.

If retransmissions of frames at the Ethernet level are occurring, it is very possible that the CRC errors that are not readable are related to the frames that communicated most recently prior to the CRC error. This is another example of how a cross-mapping of the internal data-trace results as related to the analyzer Expert system are invaluable to protocol analysis and network baselining.

Later in this book, specific topology techniques such as analysis of Token Ring errors, Ethernet errors, and WAN errors is discussed in detail. In the context of this discussion, however, the point is that more is involved in isolating errors via network baselining other than just a simple review of protocol analyzer Expert screens or management system Error logs. All error reports encountered in these types of systems should be backed up by a close review of the internal data-trace results. The information should be cross-mapped between the management or Error log systems and the internal data-trace results. This method allows for a more accurate physical health analysis technique (see Figure 5.2).

Figure 5.2: Approach of physical health analysis.

Figure 5.2: Approach of physical health analysis.
See full-sized image.

Broadcast Storm Analysis

When encountering a broadcast storm in a network baseline session, analyst can apply a specific technique to isolate the cause of the storm and the possible effect of the broadcast event on the internetwork.

A broadcast storm is a sequence of broadcast operations from a specific device or group of devices that occurs at a rapid frame-per-second rate that could cause network problems.

Network architecture, topology design, and layout configurations determine the network's tolerance level as it relates to frame-per-second broadcasts.

Consider, for example, a frame-per-second rate related to a broadcast storm generation of a specific protocol (Address Resolution Protocol [ARP], for example). Such generation, at more than 500 frames per second and on a continuing basis, is considered an abnormal protocol-sequencing event and can be extremely problematic.

The key here is to understand the difference between a normal broadcast event and an actual broadcast storm. When a normal broadcast event occurs, the broadcast is engaged from a specific physical device on a network for the express purpose of achieving a network communication cycle. There are conditions when a device, such as a router, broadcasts information to update other routers on the network to ensure that routing tables are maintained as consecutive and consistent related to internal route table information. Another standard broadcast event is when a device attempts to locate another device and requires the physical address or IP address of another device.

When a specific workstation device has a default gateway assigned, a "normal" broadcast event can occur. The device knows, for example, the target IP address of a device on the internetwork. It is common for this device to broadcast an ARP sequence to attempt to locate the target hardware address. ARP broadcasting is discussed in detail later in this book.

A workstation that broadcasts an ARP sequence to locate a target server but doesn't establish a broadcast resolve and doesn't receive a target hardware address for the server provides an example of an "abnormal" broadcast event. If the target device fails or the source broadcast operation mechanism or protocol-sequencing mechanism of the device fails, the source workstation device could start performing a loop ARP sequence that could be interpreted as a broadcast storm. Such an event in itself could cause a broadcast storm.

Figure 5.3: Broadcast storm analysis.

Figure 5.3: Broadcast storm analysis.
See full-sized image.

The point to be made here is that the frame-per-second rate of the broadcast sequence and the frequency of the broadcast sequence event occurrence can constitute an abnormal event.

Another example can be found in a Novell environment, when the Service Advertising Protocol (SAP) sequencing is engaged by specific servers. If the servers are broadcasting an SAP on standard NetWare sequence timing, the occurrence may take place on 60-second intervals. If there are hundreds or thousands of servers, the SAP sequence packets generated may become highly cumulative and affect areas of the enterprise internetwork that are not utilizing Novell processes.

In large internetworks, many of these concerns are addressed through protocol filtering within routers and switches in the network Layer 3 routing design. When a problem does occur because of an anomaly or possible misconfiguration of an internetwork, it is important to capture the information upon occurrence.

By applying an exact technique with a protocol analyzer, an analyst can very quickly capture a broadcast storm and identify the cause of the broadcast storm and develop a method to resolve the storm. Many different tools enable an analyst to achieve this. Almost all management systems for internetwork hubs, routers, and switches facilitate broadcast storm identification. The threshold that determines what is an actual broadcast occurrence versus an actual broadcast storm is usually set by the network manager or the configuring analyst of the network management platform.

The following discussion details the use of a protocol analyzer for broadcast storm analysis. When performing a data-analysis capture, a protocol analyzer is a useful tool for capturing a broadcast storm. Many protocol analyzers have thresholds that allow for an artificial intelligent–based Expert system to identify a broadcast storm. A storm can be identified by preconfiguring and studying a trigger or threshold for determining what would constitute a storm occurrence. When performing a network baseline, an analyst should always engage the threshold setting on the protocol analyzer prior to a baseline session.

Using a Protocol Analyzer for a Broadcast Storm

Based on the network architecture, the protocols, and the node count on a site being studied, an analyst must determine what constitutes a broadcast storm. This requires the analyst to be quite familiar with the topology and types of protocols and applications being deployed. A general benchmark is that a broadcast sequence occurring from a single device or a group of devices, either rapidly or on an intermittent cycle at more than 500 frames per second, is a storm event. At the very least, the sequence should be investigated if it is occurring at 500 frames per second (relative to just a few devices and a specific protocol operation).

After the threshold has been set on the protocol analyzer, a data-trace capture should be started. After the capture has been invoked, and a broadcast storm event has occurred in the Expert system with notification or in the statistics screen, the time of the storm and the devices related to the storm should be carefully noted. The addresses should be noted in a log along with the time of the storm and the frame-per-second count. Most protocol analyzers provide this information before the capture is even stopped. As soon as the broadcast storm occurrence takes place, the analyzer should be immediately stopped to ensure that the internal data-trace information is still within the memory buffer of the protocol analyzer. The data trace should then be saved to a disk drive or printed to a file to ensure that the information can be reviewed. The data-trace capture should then be opened and the actual absolute storm time noted from the Expert system or the statistical screen. Based on the absolute time, it may be possible on the protocol analyzer to turn on an absolute time feature. When turned on in the data trace, the absolute time feature enables an analyst to search on the actual storm for the absolute time event. This may immediately isolate and identify the cause of the broadcast storm.

Certain protocol analyzers offer hotkey filtering to move directly within the data-trace analysis results of the storm event. Either way, by using absolute time or hotkey filtering, the broadcast storm should be located within the data-trace capture.

Other metrics can be turned on in a protocol analysis display view when examining a broadcast storm, such as relative time and packet size. After the start of the storm has been located, the key devices starting and invoking the storm should be logged. Sometimes only one or two devices cause a cyclical broadcast storm occurrence throughout an internetwork, resulting in a broadcast storm event across many different network areas. The devices communicating at the time closest to the start of the storm inside the data-trace analysis results may be the devices causing the event.

After the storm has been located, the Relative Time field should be zeroed out and the storm should be closely reviewed by examining all packets or frames involved in the storm. If 500 or 1,000 frames are involved, all frames should be closely examined by paging through the trace. After the end of the storm has been located, the time between the start of the storm and the end of the storm should be measured by using a relative time process. This is achieved by just zeroing out the relative time at the beginning of the storm occurrence and examining the cumulative relative time at the end of the sequence. This provides a clear picture of the storm device participation and processes, the packet-size generation during the storm, and the source of the storm location. The initial several packets located for the broadcast storm should be investigated for the physical, network, and transport layer addressing schemes that may relate to the storm occurrence. This helps an analyst to understand the sequence of the storm event.

This is an extremely important process in network baselining and should be engaged in proactive and reactive analysis. In proactive baselining, an analyst must configure the proper broadcast storm thresholds on the protocol analyzer. This way, the storm events will show during the network baseline session. In a troubleshooting (reactive) event, it is important to know whether certain failure occurrences or site network failures are also being reported by the users; these may relate to the time of the storm occurrence. If this is the case, just isolating and identifying the broadcast storm may make it possible to isolate the devices causing the storm or the protocol operations involved. It may then be possible to stop the storm occurrence. This will increase performance levels and optimize the network.

Network Capacity Overload Analysis

When examining utilization, it is important to understand both the available capacity on any network medium and actual achieved utilization levels from an average, peak, and historical perspective. Every network LAN or WAN topology has an available capacity. Determining the utilization levels of a topology is important, but equally important is identifying any problematic utilization levels or saturation utilization levels. As discussed earlier, saturation of any main network medium can cause outages on a network related to an end-to-end session. Peak utilization and time measurement methods must be used to identify any outages.

Other conditions exist when the capacity, even if available, may be in an overload condition in certain topologies.

Consider, for example, a 10Mbps shared media Ethernet topology operating at 60+% utilization levels. The Ethernet topology in a shared configuration normally allows for a specific maximum capacity of 10Mbps or 100Mbps. Can the shared Ethernet medium sustain the applied utilization levels and continue to operate in a positive manner? Although capacity levels may only be operating at a peak transition of 60% or 70%, and approximately 30% to 40% of medium may appear available, the CSMA/CD mechanism of shared Ethernet could trigger an excessive collision problem at this level. As noted later in this book, in shared Ethernet media the collision-detection mechanism can increase to a level that causes problematic events at the physical level when utilization exceeds 30% of available capacity. In this example, a level as high as 60% of the available capacity can constitute a network overload condition.

With most network analyzers and management systems, an analyst can set a threshold that will immediately identify whether a specific LAN or WAN is in overload. The thresholds of certain internetwork management systems are specific to switches, hubs, and routers, and usually facilitate this process.

For the purposes of this discussion, a protocol analysis approach is followed. When performing a network baseline, the protocol analyzer should be preset for a network overload threshold setting (if an available option). This feature is usually found in an artificial intelligent Expert system threshold setting mode. An analyst should determine whether a network overload threshold setup feature is available prior to a baseline session. The next focus is the exact type of topology and protocol sequencing being examined. A 16Mbps Token Ring network requires a different overload threshold setting than a 10Mbps Ethernet environment requires.

Another consideration factor is the type of application traffic and NOS environments that are deploying various protocols across the architecture. The combined topologies and protocols create a specific architecture that must be considered when assessing an overload condition during the network baseline process. In a network continually sustaining a 50% utilization level, for example, setting an alarm below this level will trigger abnormal error occurrences or will cause already well-known information to be continuously logged. Presetting the threshold setting is somewhat of an intuitive process on the part of the analyst. The message here is that the analyst must understand the type of topology and protocol environment deployed and determine what type of a condition will cause a utilization overload of the available capacity. Figure 5.4 illustrates the concept of analyzing a network overload event.

Figure 5.4: Analyzing a network overload event.

Figure 5.4: Analyzing a network overload event.
See full-sized image.

A dedicated-circuit WAN with a fractional T1 link engaging a 256K circuit provides another example. You should not continue to run data across the circuit at 80% to 90% capacity. This type of level could cause excessive retransmissions and overflow of some of the buffers in the end-to-end router platforms between two specific wide area sites. If more than 80% to 90% utilization is being achieved, even though there is still 10% available capacity, it would be better to upgrade the circuit to increase performance levels. The other factors involved in making this decision would be the type of router technology, the type of protocols, and the consistency of this traffic level. There are many factors related to this occurrence.

A protocol analyzer example illustrates this technique. A protocol analyzer can be deployed across a certain network topology. When the network baseline session is initially configured, a threshold can be set for the proper overload condition. This is the alarm that occurs when the overload event happens. If the network baseline process is active, and the analyst encounters a network overload alarm, the protocol analyzer should then be stopped and the capture should be saved. The analyst should note the network overload event as to the time of occurrence and the type of devices involved. The data-trace capture should then be reviewed by isolating high-utilization occurrences within the trace. Some, but not all, network analyzers enable an analyst to turn on network utilization as a metric within the data-trace view results. The key is to properly mark the absolute time of the occurrence from the analyzer Expert system or the management system. The Absolute Time field should be turned on within the data-trace capture results.

Whether the network overload is located through a hotkey filtering system or absolute time, the overload occurrence should be closely examined. Most likely, a set of devices is involved in communication when the network overload occurrence takes place.

The network utilization column in the data trace should be examined and noted. The internal trace results should also be closely examined for the type of packet size used during the data movement when the utilization overload condition occurred. As noted earlier, utilization is a component of data size and rate of data movement. If an overload condition of 90% occurs when packet sizes are increased above 2K from an area of 100 bytes, this clearly indicates that larger blocks of data are present at a consistent data rate (increasing utilization on the medium). The actual protocol event sequences can then be examined for the cause of the overload. Based on the start time of the overload occurrence identified within the data trace, it may be possible to note the data-trace events in the first several packets identified at the time of the occurrence. Several features can be activated in the protocol analysis detail review to examine information such as Hex or ASCII data view of packet internals to identify the opening of a certain type of file. The application layer protocol could also be examined for a specific file sequence that has been opened. By identifying the types of files opened and the protocol events occurring at the time of the network start sequence, an analyst can relate the utilization overload to a specific application operation or occurrence on the network—a server synchronization event, a unique application launch, or a communication cycle such as database transfer, for example.

Figure 5.5 shows how changing the size of data movement affects network utilization.

Figure 5.5: Changing the size of data movement.

Figure 5.5: Changing the size of data movement.

It is critical to perform network capacity overload analysis during network baselining. An analyst can use this technique in both a reactive analysis for emergency troubleshooting, as well as in a proactive way to examine capacity overloads.

Network Throughput Analysis

When performing a network baseline, effective throughput should always be considered a standard measurement. The most accurate way to perform effective file throughput (EFT) analysis is to measure EFT against exact file transfer. General throughput measurements can also be taken against a specific network LAN or WAN area.

A specific technique applies to performing throughput analysis consistent with protocol analysis methodology in a network baselining session. This technique involves cross-mapping throughput measurements obtained from a protocol analyzer statistical or Expert screen with the throughput within the actual data-trace results.

When performing throughput analysis, it is very important to mark the beginning of a file open and a file close sequence. This is because after a file transfer begins, the transfer of data units in packets between two specific devices is usually a consecutive process and data movement can be consecutive and rapid.

Marking the file open and close sequences helps the analyst to determine the throughput across the internetwork channel in a LAN or WAN area. The following discussion focuses on measuring EFT with a protocol analyzer.

Prior to the study, the protocol analyzer deployed on the network should be set up for a network baseline study, and any Expert system threshold in an artificial intelligence–based protocol analysis or statistical screen threshold should be set. Several effective throughput levels are standard. During the network baseline study, the analyst should be familiar with the internetwork architecture and aware of the throughput typically achieved in the particular internetwork architecture. After the standard throughput levels for the specific network have been determined, they should be entered into the Expert system.

If the required achieved level of effective throughput for a network baseline study is 500Kbps, for example, this threshold level should be set in the protocol analyzer prior to a capture session. After the 500Kbps threshold has been set, the protocol analyzer alarm triggers or a statistical screen is identified while a capture is active when a file transfer drops below the 500Kbps mark. The analyst should mark the time of the effective throughput drop and note any devices involved in transfer during the low effective throughput drop. It is next important to identify the timing event of the actual low effective throughput alarm occurrence. Most analyzers include an absolute time mark in the statistical output screen or the Expert analysis screen. With these items being noted, a low EFT event could then be cross-mapped to the internal area of the trace by using the effective throughput measurements discussed earlier in this book.

After the trace analysis data has been saved and the detail trace opened, it is then possible to view the data and locate the low EFT occurrence within the data by cross-mapping the statistical absolute time or hotkey filtering to a low effective throughput event.

When a low effective throughput occurrence is found within the trace, it should be verified by setting relative and cumulative bytes and adding the relative-time metric against data movement that is cumulative to determine the effective throughput achieved for the dataflow operation.

Figure 5.6 displays the technique of measuring EFT.

Figure 5.6: Measuring EFT.

Figure 5.6: Measuring EFT.
See full-sized image.

It is also possible to trace throughput on a consistent basis by setting several protocol analyzers or management systems at specific threshold levels. This is another unique technique, because it enables the analyst to consistently capture the throughput levels of various devices communicating across an internetwork. Most protocol analyzers can develop multiple sets of views against certain devices communicating with application- or connection-based protocols within a LAN or WAN topology. When specific devices are being tracked in a statistical or Expert screen of a network analyzer, it is possible to monitor the ongoing effective throughput by examining statistical screens.

Measuring effective throughput is not only valuable for low effective throughput occurrences, but also for ongoing tracking of file transfers. The technique for this process is to set up the thresholds correctly and to consistently view the specific devices being reviewed for throughput during the baseline.

An example is to monitor the client stations related to the most critical servers at the site. Specific applications could be launched and the throughput could be compared for different applications. This enables an analyst to determine the data movement and the data-rate for standard throughput; it also enables an analyst to compare application PDU inputs within different application profiles. This identifies throughput differences in certain areas of the internetwork and also the throughput differences related to certain application sequencing.

This is a critical technique that should be followed in proactive baseline studies. This process can also be used in reactive analysis for low throughput or performance problems.

Network End-to End Interpacket Timing Analysis

Network communication involves at least two devices communicating with each other: a workstation and a server; or a server communicating with another server (at a minimum).

Specifically, any two devices communicating across an internetwork identifies an end-to-end communication channel. A data communication sequence occurs across a network channel and data is processed and a data transfer is achieved.

It is possible to use protocol analyzers and network management systems to examine the network end-to-end timing between two specific devices. As discussed under the timing section in the last chapter of this book, several different timing metrics or statistics can be activated in a protocol analyzer when measuring this process. Some of these metrics or statistics include absolute, delta, and relative time.

A specific technique applies, however, when examining how workstations and servers are communicating as to timing in a network baseline session. This is accomplished by determining the time difference between Request and Response sequences.

A workstation and server across any topology usually invoke a specific protocol sequence to transfer information, and may engage another protocol sequence to call on a specific server file and so forth. Later in this book, there are extensive discussions on specific protocol sequencing methods for different protocol types and suites such as Novell and TCP/IP. For the purposes of this discussion, it is important to understand that there is an exact technique for measuring end-to-end interpacket timing that enables an analyst to quickly determine how various workstations and servers are responding to each other in terms of performance across the internetwork. The process of using a protocol analyzer is used for illustration.

When using a protocol analyzer for end-to-end interpacket timing analysis, it is first important to set the proper thresholds that will trigger alarms when an excessive response time occurs between a workstation and server. Many network analyzers and management systems have threshold settings that set off an alarm when a server or workstation responds over a certain time period. Normal internetwork end-to-end channel communications on a LAN with multiple segments should take no more than 10 to 15 milliseconds (ms), and in fact it is common to see response time well under 5ms on most internetwork infrastructures. Across WAN infrastructures, interpacket latency for end-to-end sessions can be as high as 50ms. A response time of less than 20ms is more typical, however. Some WAN designs are based on different deployment design criteria, such as whether servers are centralized or decentralized across an infrastructure. The server positioning design must be considered when deploying several protocol analyzers against the end-to-end channel.

An example of this process includes a multiple protocol analyzer data-capturing session against an Ethernet end-to-end channel that involves three segments including a WAN. For example, a multiple analyzer positioning "map" relative to the complete internetwork layout can be drawn up. Segment 1, for example, involves a shared Ethernet segment, which connects to a router and then connects to a WAN Frame Relay circuit. That circuit interconnects another WAN site noted as the remote location with an additional router, which is connected to another shared Ethernet segment, Segment 2. This mapping involves the following three specific network areas that must be traversed:

Segment 1
The WAN Frame Relay cloud, and
Segment 2

With this example in mind, at least four protocol analyzers are required to examine an end-to-end channel communication session. One protocol analyzer could be placed on the Segment 1 location in the shared Ethernet area. An second protocol analyzer could be positioned at the exterior side of the WAN router connected to Segment 1 and connected to the Frame Relay cloud or the Frame Relay Assembler Dissembler (FRAD) link. A third protocol analyzer could be positioned at the remote site on the exterior side of the WAN analyzer connected to the FRAD link at the Segment 2 site, and a fourth protocol analyzer could be connected to the remote Ethernet shared Segment 2. This setup places four network protocol analyzers in a parallel mode for synchronized analysis.

Figure 5.7 shows how end-to-end channel analysis can be used in a network baseline study.

Figure 5.7: Engaging in end-to-end analysis.

Figure 5.7: Engaging in end-to-end analysis.
See full-sized image.

The next step is to synchronize all the protocol analyzers in terms of date and time to ensure that all the relative base measurements for time in each analyzer platform are as close as possible. Some network protocol analyzers offer synchronization of time through GMT synchronization, depending on the manufacturer. Other inband and outbound triggering mechanisms in different analysis tools also enable an analyst to configure synchronization.

After the analyzers have been engaged, application characterization techniques can then be deployed against a network end-to-end channel being timed. An application can be launched from a local workstation at one segment to access a server at a remote site. This would be valid if a distributed server is placed at the remote site. The application, when launched, can be monitored by all four protocol analyzers. With absolute time set as active, it is possible to trace a specific packet, such as a TCP or a Novell SPX packet, by monitoring the sequence number as it travels from the source workstation to the remote server.

It is also possible to track the TCP or SPX response by examining the sequence and acknowledgment number as well as the identification fields of the networks traversed in the network and transport layer protocols of each packet.

By tracking the packets sent and received, the end-to-end network channel interpacket timing measurements can be determined. An analyst can make delta time active on the Section 1 analyzer to examine the request outbound process from the original workstation and to examine the inbound server response. Even by just using the first analyzer, however, the actual time spent across Section 1, across the WAN Frame Relay cloud, across Section 2, and across the remote segment is all compiled into the remote response. By using several analyzers, an analyst can determine how much time is spent on Segment 1, how much is related to the internal turn time of the workstation or delta time on Segment 1, how much time is spent across router one at the local station, how much time is spent traversing the Frame Relay cloud, how much time is spent traversing remote router two at the remote site, and how much time is spent accessing the server on the remote Ethernet segment at the remote site. An analyst can actually determine the amount of time spent in the server, the server turn time, as well as the server delta time at the remote site.

Timing analysis is an extremely valuable technique, but it must be very carefully configured and engaged. By applying this technique, an analyst can examine the appropriate analyzer screens and cross-map high response time or long acknowledgment time errors, as well as other conditions such as slow responding servers. The analyst can check these occurrences against actual timing events in the internal data-trace analysis results. This technique enables the analyst to accurately isolate the cause and results in extremely on-target information.

This process can be coupled with multiple application-characterization techniques, such as launching certain events. It is then possible to determine delays that may be related to the application processing or delay event sequences that take place when packets are traversing routers or switches on large internetworks.

Various tools enable an analyst to process data scripts across an internetwork based on this sequence.

Some of the most valuable tools available for this type of process include the Chariot platform offered by Ganymede Software and Optimal Software's Application Expert. The Chariot tool allows for immediate generation of data scripts across the internetwork; this will quickly show defined data movement, and allows for viewing transaction time, throughput, and response time operation. Chariot tools can be placed at multiple network points.

Network analyzers can also be placed at the same points to extract data. Optimal's Application Expert provides response-time prediction and data analysis in a multipositioning process, but adds a feature to examine an application's thread across an internetwork.

Many different management systems offer unique platforms and features that can engage tests similar to Ganymede's Chariot and Optimal's Application Expert.

The key factor to remember is that timing measurement is a valid process and should be configured and performed very carefully. By so doing, an analyst can isolate delays and perform specific optimization tuning against internetwork transfer.

This is a valid technique and should be used consistently by an analyst in both proactive and reactive analysis situations.

Transport and File Retransmission Analysis

When conducting either a proactive or reactive network baseline study, it is important to monitor the transport layer and application layer for retransmission events. A retransmission occurrence can cause an inefficient situation in terms of final data communication. A retransmission is a redundant dataflow event that adds to the amount of cumulative bytes transferred on a network to effect a final transmission of an intended application process or NOS data.

The specific purpose of the transport layer mechanism operation is to ensure connection. Most transport layer protocols engage a connection process that invokes a sequence-and-acknowledgment cycle between two specific end nodes. This sequence-and-acknowledgment maintenance cycle causes, in certain cases, a higher transmission of redundant data. This is common if there are delays or other abnormal occurrences in the communication channel between two end devices. Later in this book, an extensive discussion is presented on various types of transport layer protocol operations and processes.

For the purposes of this discussion, the main function of the transport layer in network communication protocol sequencing is to ensure that there is a transport mechanism in place for transferring data from specific protocol port areas within certain workstations and hosts across an internetwork. In certain cases, the transport layer also generates packet traffic that allows the transport channel created to be maintained for connection.

Protocol analysis enables an analyst to monitor the transport channel that has been established between a workstation and a file server for general operation and transfer of data across the channel. The analyst can also monitor the transport layer for recurrences of transmission of data considered redundant. In this case, the focus would be monitoring transport layer retransmissions.

Because the transport layer is essentially in place to establish a network channel for data port communication and, at times, to maintain the communication channel in a consistent and reliable state, it is critical to monitor this area through a network baseline process to ensure connectivity.

If a transport layer connection is established with TCP, two specific ports for TCP may be opened on a respective end-to-end channel on an internetwork. The endpoints could be a workstation and a file server, each of which would have a TCP port open. The transport layer protocol TCP would maintain the connection. Polling events between the two protocol ports in the two devices would take place that would allow for maintaining the TCP port transport layer channel that is open related to the port transmission activity.

With the aid of a protocol analyzer, an analyst can monitor this channel. If a delay is present in the internetwork, or if one of the two devices encounters a problem, the end-to-end communication may be negatively affected. Again, this might result from internetwork delays or because one of the devices has an internal issue related to resource handling or processing of data. When this occurs in the TCP environment, the transport layer may show extensive retransmissions of data because it is attempting to ensure that the data required for achievement of an application process is sent on a continuous basis. When retransmissions occur, redundant data is sent. One of the endpoints—for example, the workstation—could encounter a situation in which it is not receiving required data responses from the server because of delays. As this occurs, it is likely that the workstation will re-request the data on a continual basis. Under certain conditions, the host may retransmit the data for integrity purposes, based on the inherent operation of the transport layer protocol TCP.

Figure 5.8 shows retransmission analysis being engaged.

Figure 5.8: Engaging in retransmission analysis.

Figure 5.8: Engaging in retransmission analysis.

After a protocol analyzer has captured the retransmissions, an analyst can mark the percentage of data related to retransmissions as related to standard data process transmissions to understand the retransmission level. An extremely high retransmission level, such as 20% or 30% of all frames transmitted, shows an excessive level of redundant communication.

This example clearly shows how transport layer transmissions can become an issue. The simplest way to t monitor transport layer retransmissions is to use a protocol analyzer.

Application-Based File Retransmission

Application-based file retransmission is another type of retransmission event that can take place in application operations when communication occurs between workstations and servers. In most cases, a complete file retransmission is usually invoked by application operational sequencing, or developers may intentionally code it into the application process for redundant cycles to enable integrity checks in the application transfer.

When a file is sent across a network, a file open and a file close sequence is usually invoked. A protocol analyzer enables an analyst to monitor how often files are opened and closed. If some files are opened and closed on a consistent basis in consecutive order, the application code or the application process may very well be invoking the file retransmissions. When this type of situation occurs, it is critical to monitor the event with a protocol analyzer. Doing so enables the analyst to identify the inefficiently designed applications causing redundant excessive communication on a network.

To monitor transport retransmissions, the network protocol analyzer should first be set for the appropriate thresholds to capture this error. Some network protocol analyzers enable the analyst to set up a predetermined threshold or filter for this type of event. After the network baseline session has been started, if any of the transport layer retransmissions alarms are triggered, the error occurrence and time can then be documented. A hotkey filtering system can then be invoked after the file has been captured and saved. The hotkey filtering system can be engaged or the time of the transport layer retransmission can be marked from the appropriate analyzer screen. After the time event of the retransmission event has been identified and noted, the internal areas of the data capture can then be examined for the actual retransmission event.

In some cases, it is possible to mark the transport layer sequence and acknowledgment fields for the retransmission event. Some protocol analyzers highlight this alarm; with others, only a detailed analysis can identify this concern. The technique involved is to cross-map any statistical alarms on the analyzer from an Expert or statistical screen against the retransmission event inside the data trace.

After the retransmissions have been found, they should be correlated to any file open and close sequence prior to the retransmission. This process enables the analyst to identify the file being transferred and to possibly identify the application being involved. If the retransmissions are excessive, the amount of retransmissions in terms of the data transfer frame count for normal retransmitted frames should be compared and documented. A percentage of retransmissions should be calculated against overall standard data transmission events. If retransmissions exceed 10% of overall traffic, this is a problematic level that could cause redundant communication and additional utilization on the medium. Retransmission levels as high as 20% and 30% could cause application integrity issues and/or application failures. The level of retransmissions should be carefully noted.

Figure 5.9 presents the concept of file fluency analysis.

Figure 5.9: File fluency analysis.

Figure 5.9: File fluency analysis.

When monitoring application-based file retransmissions, it is also important to set up a protocol analyzer prior to starting a network baseline to capture any concerns related to this type of event. Some protocol analyzers enable an analyst to set up a filter or an Expert system threshold to trigger an alarm. If when performing a network baseline session, a file retransmission alarm is triggered, or if a statistic is reported on a screen for an event, the analyzer can then be stopped and the data should be saved for review.

The event occurrence from the statistical or Expert screen should also be noted for the absolute time of occurrence and the devices involved in the file retransmission. It may also be possible in some circumstances to mark the protocol type or application invoked. Usually this is just noted as the protocol layer engaged and not the application type.

After the capture has been saved, the data trace should then be opened and the event should be located by hotkey filtering or matching of the absolute time of occurrence. After the file retransmissions have been located, the type of files being opened should be examined and, if possible, cross-mapped to an application engaged at the site. It is sometimes possible to turn on Hex or ASCII data display views with an analyzer to examine the internals of the file open statement (if the application layer does not report the file type, for instance). This enables an analyst to identify the application access that is occurring.

Next, the file open and close process should be monitored for how often the file is opened and closed in a consecutive manner. A file opened and closed once is a normal event. If a file is opened and closed 25 consecutive times with the same data being transferred, this is an excessive file retransmission event. The application development team should be contacted to investigate whether this is an intended application event or whether this is an abnormal event. In most cases, this issue can be resolved by working in tandem with the application development and NOS server support teams.

Figure 5.10 shows file access-error analysis.

Figure 5.10: File access-error analysis.

Figure 5.10: File access-error analysis.

Both steps of examining transport and file retransmissions via network baselining are extremely important. These techniques assist in identifying redundant communication that can be eliminated from a communication session. These steps also assist in identifying network-based foundational or application process issues.

Path and Route Analysis

During a network baseline, it is important to always investigate the route of packet transfer across a network or internetwork channel. It is also important to determine the number of hops or paths taken across internetwork interval points when transferring information.

The use of protocol analysis in this area is an extremely valuable process. Path route and path count analysis can be used in proactive and reactive network baselining sessions. To examine path route concerns, it is important to understand that within network layer protocols, and also internal to some of the data involved above the physical layer inside a packet, there is information that may identify the route that a packet has taken through an internetwork.

When examining a simple Token Ring frame, for example, it is very common to examine a Routing Information field. The Routing Information field includes specific information as to the bridge, router, or switch that been traversed, and how many hops have been taken related to the Token Ring packet. An analyst can use a protocol analyzer to capture this information.

Figure 5.11 illustrates route and path analysis in a Token Ring network.

Figure 5.11: Route and path analysis.

Figure 5.11: Route and path analysis.
See full-sized image.

Several network layer protocols can also be examined—such as IP, IPX, and even Datagram Delivery Protocol (DDP) in the AppleTalk environment—for a Hop Count field. A Hop Count field can be examined and reviewed for a quick view of how many routes have been traversed. It is important to associate the hop count with the point of capture and where the packet is traveling through the internetwork.

Certain network layer protocols, such as IP, provide even more information. The IP protocol provides a Hop Count and combined Time in Transit field, associating how long a packet has spent on a network in the term of actual seconds or time metrics. This a combined field and shows time-to-live (TTL). This field is discussed in detail later in Chapter 7, "LAN and WAN Protocols."

The technique of examining packet, route, packet count, and path count is extremely important and can become intuitive for analyst.

Most protocol analyzers enable an analyst to immediately examine the internals of a packet or frame in a detailed manner. Some protocol analyzers have alarms settings for excessive hop counts or path routes traversed. The protocol analyzer should be set up for any alarms or thresholds required.

After the capture has occurred, if any events take place in a network baseline session, the information related to a path count exceeded or a hop count exceeded field should be noted. The absolute time, the occurrence of the event, and the devices involved should also be noted. The capture should then be stopped and saved. When the data trace is opened, the event should be located through absolute time cross-mapping or hotkey filtering. When the frames showing the event are located, the frames should be opened in a detailed mode. The information related to path route or hop count should be examined. In most cases, this examination focuses on the network protocol involved. Most network layer protocols usually just display the source network, source node, and sometimes the socket or protocol port being communicated to across the internetwork. By examining the source network relative to the destination network, the hop count can be determined for the data communicated across a set of routes.

In a flat internetwork, such as a switched Ethernet environment, a hop count marking in a Hop Count field within a network layer protocol may not be valid or exist. By examining the source and destination network, however, an analyst can determine whether a device is located across an Ethernet end-to-end channel that involves two, three, or four switches. Again, it is extremely important to understand that, in most instances, the network layer protocol holds this information.

Figure 5.12 illustrates route analysis of an Ethernet network.

Figure 5.12: Route analysis of an Ethernet network.

Figure 5.12: Route analysis of an Ethernet network.
See full-sized image.

The source and destination networks involved should then be marked, and then the devices should be mapped against the physical topology of the network. By just locating the source and destination network, the devices can be reviewed against a physical architecture or topology map. By locating where the devices are positioned on the network and associating this with the route taken, it is sometimes possible to identify inefficient design or incorrect implementation of servers and workstations across an internetwork. Remedies can then be taken, such as relocating a server or reconfiguring switches or routers in such a way that fewer routes are traversed.

At times, excessive multiple-route occurrences can be identified. In some cases, routing loops or delays can be identified in this sequence. Sometimes a passthrough route through a server that provides for IP forwarding or has multiple NICs can be located through this technique.

This is extremely important and allows for route protocol analysis to be used to examine the route that a packet actually takes and the number of hops that the packet encounters when traversing the internetwork from the source to the destination.

The ability to map this to an actual sequence flow or roadmap taken through the internetwork is an invaluable process to a network baseline study, in both a proactive and reactive mode.

End-to-End File Transfer Analysis

Another important technique in network baselining is end-to-end file transfer analysis. File access is the main operation of an internetwork. As mentioned earlier, a protocol analyzer can be used consistently to mark a file open and a file close sequence. An analyst can review when files are opened and closed, and can identify the source and destination devices related to the transfer. As discussed earlier, the analyst can examine the internal time sequences between a file transfer and mark the timing related to the file transfer event as well as the amount of data moved.

Taking into account the techniques presented in the application discussion of this book, it is next important to note that a simple technique should be followed during every network baseline study to examine end-to-end file transfer. When performing a network baseline, even with the absence of errors, it is important to constantly monitor file transfer events when examining the internal levels of protocol analysis–based traces captured during a session.

A simple way to do this is to ensure that any protocol analyzer or network management tool used is set up for the appropriate thresholds that will enable identification of any abnormal occurrences of this type. Some protocol analyzers enable an analyst to set up a threshold that will alarm when a certain file is opened and not closed in a relative amount of time.

Figure 5.13 illustrates a file transfer delay event being analyzed.

Figure 5.13: Analysis of a file transfer delay.

Figure 5.13: Analysis of a file transfer delay.

An analyst can also identify this process by setting specific alarms, such as alarms to a nonresponsive host environment or a slow responding server. In certain conditions, this may enable the analyst to immediately identify any abnormal end-to-end file transfers within a data capture.

A normal process is for a file to be opened, accessed, and closed. In some cases, a file is opened, accessed, and then abruptly stopped. Sometimes the event can be caused by delays on the network or failure of the NOS or failure of the application itself.

Often a device failure in the workstation or file server causes this event. It is important to examine this event on a consistent basis as a thread throughout many network baselining sessions in an enterprise internetwork study.

A protocol analyzer can set up in a predetermined configuration prior to a network baseline session to allow for a threshold alarm to activate upon incorrect file closures or any nonresponsive conditions on workstations and servers.

After the protocol analysis network baselining session has been started, it is then possible to identify any alarms that may be triggered. If any alarms occur related to file transfer sequencing, the devices involved and the absolute time of the occurrence should be noted. After the capture has occurred, the data trace should be saved. The data captured should be opened and the abnormal file transfer located through a hotkey or absolute time cross-mapping sequence.

In the event that alarms are not triggered, a page through the trace exercise enables the analyst to examine file open and close sequencing. If the error is located, the file open and close process can be examined. Once marked, the open and close process should be considered the start and stop point for a file transfer. The packets should next be examined from the open state to the close state for redundancy or abnormal retransmission events. In some instances, the trace data will show an abnormal stop and file close as even be present in the trace.

The key factor here is that normal consecutive file open and close sequencing is essential to a positive network communication process. It is also the main reason that we have file open occurrence, an access event such as a data transfer, and a file close. Most of the time, the application layer protocol is monitored for this type of cycle. Application layer protocols include such types as NCP and Windows NT's SMB. Further information on application layer protocols appears later in this book.

In closing, it is important to examine end-to-end file transfer in every network baseline study session.

Trace Decoding Techniques

When performing a network baseline, the final technique is the actual method used by the analyst to identify specific events and perform trace decoding. Earlier in this book, a brief discussion was presented on a process termed "paging through the trace."

When moving through a data capture, the data must be examined on a consecutive basis and reviewed closely by carefully moving through the trace information. Large data captures can include multiple frame sequences that can involve thousands of frames. Some network baseline capture sessions involve anywhere from 30,000 up to 100,000 or more frames. Because of this, it is important to have a technique that allows for quick examination of the frame sequencing. One technique is to use an Expert or statistical screen for cross-mapping event errors and quick hotkey filtering to the area in the trace that shows the problem. At times, an Expert or statistical screen may actually show the frame number prior to stopping the capture when an error event occurs.

When performing a network baseline, careful notes should be taken from the Expert system or the statistical screen prior to stopping the capture. If critical information is noted, such as absolute time of occurrence, devices involved, or frame numbers related to error occurrence, an analyst can move through the trace more quickly after the capture has been saved for review. Even if an Expert system is not available, there are ways to page through the trace to obtain important information that is essential for a network baseline session. This is an extremely valuable step in network baselining.

Figure 5.14 illustrates the concept of trace note taking during a network baseline study.

Figure 5.14: Trace note.

Figure 5.14: Trace note.

The following is an example of using a protocol analyzer to page through the trace. First, the analyzer must be preconfigured for the proper metric thresholds to ensure that event alarms will trigger and show absolute time and areas of occurrence for frame category marking prior to the error event. Watching the protocol analyzer closely while the capture is occurring will also allow for marking this type of event.

After the capture has been started, the analyst should closely monitor the protocol analyzer. This is extremely important when performing reactive troubleshooting analysis for rapid cause isolation. If an error alarm for a retransmission is triggered on the analyzer statistical screen when frames are being received into the analyzer NIC and the analyzer is showing an inbound frame count of 5,325 frames captured, the frame number 5,325 can be quickly noted as a possible frame area in the data trace as associated with the retransmission error.

The error and the devices involved with the error should be recorded on a statistical notebook, along with the frame area within the trace that is inbound. Thus, when the data capture is saved and the event examined, the analyst can swiftly move to frame 5,325 for cause isolation.

Figure 5.15 illustrates the concept of frame marking during a protocol analysis session.

Figure 5.15: Frame marking.

Figure 5.15: Frame marking.

This is just one example of this type of cross-mapping of an ongoing rapid dynamic process throughout a network baseline session. Again, the analyst must be extremely involved with the protocol analyzer while the capture is occurring for this technique to be helpful.

After the protocol analyzer has been stopped, whether an Expert is active or inactive, the analyst can move through the data in a page through the trace process to investigate certain data. If any events are located during the live capture dynamic session, the cross-mapped technique of reviewing the data associated with certain frame numbers or with absolute time occurrence should be followed. Hotkey filtering should also be used, when possible. This will allow for quick use of an Expert system or an artificial intelligence statistical screen to quickly key to the type of error within the data capture.

After a problematic frame or packet has been found, certain types of information should be clearly recorded in a statistical mode by the consultant or analyst performing the network baseline.

An analyst should note information related to the following:

Address(es)
File access and name(s)
Time frame(s)
Frame structures for field length
Routing information
Specific protocol(s)

Address Marking

When examining a trace, it is first important to note the various levels of addressing. The physical address should be noted inside any packet that has an error. The network layer address should also be noted, along with any transport layer ports that are opened. This will assist in identifying any events related to an error frame that could be associated with certain devices or specific network applications.

An analyst might, at times, capture the trace information and identify a specific area of the trace that shows the problem. Depending on the analyst's experience and understanding of the internetwork, however, the analyst may or may not be able to identify the device or application associated with an error. By properly marking the information in a log and forwarding it on to an appropriate MIS person, it is quite possible that specific servers or applications can be pinpointed as the cause of the problem.

File Access and Name Marking

It is also important that the analyst note any file open and close sequences associated with error occurrence. After the file open and close sequences have been marked, the file being called on in the application layer protocol, the Hex code, or ASCII code that shows the file being opened should be noted. This information can then be forwarded to the appropriate MIS personnel who may be able to identify the application associated with the error. Again, the analyst may not always understand which devices or applications are involved. By clearly noting the file sequencing or the file that is opened and closed in an analysis log from a protocol analysis standpoint, however, other MIS team members may be able to identify the application or area of the internetwork related to file server location causing the problem.

Time Frame Marking

As discussed throughout this chapter, it is important to record the time frame of an error event. The absolute time frame of a packet is usually marked within the upper-layer physical header inside the detail level of a packet area. It is also possible to turn on the absolute time within the data-trace summary screen to review the absolute time of any error frame occurrence. Each time an error is noted in a baseline session, the network analyst should always record the time frame occurrence in a statistical analysis log.

Frame Structures for Field Length Marking

Occasionally, a network baseline analyst will encounter a frame structure error. Frame structures are usually designed for a specifically defined operation. For example, a typical Ethernet frame is based on a frame size ranging from 64 bytes to 1518 bytes. A 4Mbps Token Ring frame is limited to 4096 bytes. An analyst should always mark the frame size associated with any error frame occurrence captured.

Specifically, if an error frame is detected when examining a data trace, the frame size and protocol layer sizes should be noted, as well as the data payload. Each packet will always have a certain maximum transmission unit (MTU) size and specific amounts of data will be associated with the protocol layers. It is important to mark the payload of data transferred in any frames where errors occur. This information should be recorded in the analyst's statistical log.

Routing Information Marking

Another key area to monitor is the Routing Information field. As discussed earlier in this chapter, frames contain specific information that indicates to an analyst the route taken or the path hop count achieved. When performing a data-trace analysis session that involves cause analysis or standard network baselining for error mapping, the analyst should clearly mark any critical routing information, such as the route or path taken. If routing update sequences are captured between routers or switches, or other key devices on the network such as servers or hosts, the analyst should also record any routing update sequences when performing error-mapping analysis.

Specific Protocol Marking

Specific protocol marking is also important. This book does not discuss certain event occurrences. These are abnormal occurrences that do not fit any specific profiles, such as an abnormal error from a certain application, or an abnormal type of protocol event that has not been seen before in a network baseline session. If any abnormal event occurs, the analyst should document as much information as possible about the event. This would include the absolute time of occurrence, the devices involved, and any protocol sequencing taken.

When performing a network baseline session, it is extremely important to be in a constant review mode, which involves statistical documentation of the occurrences. Statistical documentation involves recording and documenting information prior to stopping the data capture. It is also vital that information be saved to a trace file save mode or a Common Separate Value (CSV) file format when possible so that the information can be reviewed later in the network baseline process.

Some of this information will be extremely critical to the final data-acquisition decoding and final reporting process of the network baseline study.

It should never be assumed that information is not required or is unimportant. All information should be documented in a concise and detailed basis. When performing a large internetwork baseline involving several baseline areas, such as a 10 to 15 segments in an Ethernet environment, for example, some of the information found in Baseline Session 1 may be relevant in Baseline Session 7. It is important that every network baseline session be consistently documented through statistical save modes for trace file captures, along with general note-taking and statistical marking during a trace analysis session.

Conclusion

It is extremely important that all the techniques discussed throughout this chapter be followed during a network baseline session. These techniques are extremely valuable when roving from session to session across large internetwork baseline studies. They are also valuable when performing a simple network baseline study. These techniques apply in both proactive and reactive analysis sessions.

When performing a network baseline, it is vital that a clear documented file on the network be available for reviewing topology maps, address assignments, and other key configuration information. The following chapter discusses the importance of network documentation during a baseline session and of cross-mapping the documentation to the results found in a trace analysis session.

About the Author

Daniel J. Nassar is the most renowned network analyst and baseline consultant in the global marketplace. As president and CEO of LAN Scope, Inc., an international network baseline consulting and training firm, he has engineered and performed more than 3,000 major industry network baseline studies on large-scale, critical networks for Fortune 100 companies. In addition to his LAN Scope responsibilities, Dan is president of Eagle Eye Analysis, Inc., a Philadelphia-based network consulting firm. At Eagle Eye, he evaluates industry products, such as network analyzers and new networking devices such as switches and routers. He provides remote evaluation and writing services direct to industry manufacturers, including documentation design for industry white paper reviews, training manual design, and overall product assessment. His extensive experience has led Dan to write four previous books, as well as more than 150 articles and 10 training courses on network performance baselining. He has also done more than 100 presentations at major tradeshows and has chaired five major industry tradeshow boards.

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice. International rights = English only.

International rights = English only.

Click to order