Table of contents
TOC
Collapse the table of content
Expand the table of content

DirectAccess Capacity Planning

James McIllece|Last Updated: 1/25/2017
|
5 Contributors

Applies To: Windows Server 2016

This document is a report on Windows Server 2012 DirectAccess server performance. Testing was performed to determine throughput capacity using high-end computer hardware and low-end computer hardware. High and low-end CPU performance was dependent on the network traffic throughput and the types of clients used. A typical DirectAccess deployment (and the basis for these tests) consists of 1/3 (30%) IPHTTPS clients, and 2/3 (70%) Teredo clients. Teredo clients outperform IPHTTPS clients in part because Windows Server 2012 utilizes Receive Side Scaling (RSS) which allows use of all CPU cores. In these tests, since RSS is enabled, Hyper threading is disabled. In addition, TCP/IP in Windows Server 2012 supports UDP traffic allowing Teredo clients to load balance across CPUs.

Data was collected from both a low-end (4 core, 4 Gig) server, and from hardware which is expected to be a more typical in a high-end (8 core, 8 Gig) server. Below is a screen shot of the new Windows 8 task manager on low-end hardware with 750 clients (562 Teredo, 188 IPHTTPS) running ~77 Mbits/sec. This is to simulate users who do not present smart card credentials.

These test results indicate that Teredo performs better than IPHTTPS in Windows 8, but that both Teredo and IPHTTPS bandwidth usage has improved when compared to Windows 7.

Test results

High-end hardware test environment

The following chart shows the results of the high-end hardware performance test environment. All test results and analysis are detailed in this document.

Configuration - HardwareLow-end Hardware (4GB ram, 4 core)High-end Hardware (8 GB, 8 core)
Double Tunnel

- PKI

- Including DNS64/NAT64
750 concurrent connections at 50% CPU, 50 % Memory with Corpnet NIC throughput 75 Mbps. Stretch target is 1000 users @ 50% CPU.1500 concurrent connections at 50% CPU, 50 % Memory with Corpnet NIC throughput 150 Mbps.

Test Environment

Perf Bench Topology

Test Environment

The performance test environment is a 5 machine bench. For the low-end test, one 4-core 4 Gig DirectAccess server was used and for the high-end hardware test, one 8-core, 16 Gig DirectAccess server was used. For low-end and high-end test environments the following was used: one Back end Server (the sender), and two client computers (the receivers). Receivers are split among the two client computers. Otherwise, the receivers would be CPU bound and limit the number of clients and bandwidth. On the receiving side a simulator to simulate hundreds of clients (either HTTPS or Teredo clients are simulated). IPsec, DOSp are both configured. RSS is enabled on the DirectAccess server. RSS queue size is set to 8. Without configuring RSS, a single processor will get pegged at a high utilization while the other cores are underutilized. Also of note is that the DirectAccess server is a 4 core machine with hyper threading turned off. Hyper threading is off because RSS only works on physical cores and use of hyper threading produces skewed results. (This means that not all the cores will be uniformly loaded).

Testing results for low-end hardware:

Testing was performed both with 1000 & with 750 clients. In all cases traffic split was 70% Teredo and 30% IPHTTPS. All tests involved TCP traffic over Nat64 using 2 IPsec tunnels per client. In all tests, memory utilization was light and CPU utilization was acceptable.

Individual Test Results:

The following sections describe individual tests. Each section title highlights the key elements of each test followed by a summary description of the results and then a chart showing the detailed results data.

Low-end Perf: 750 clients, 70/30 split, 84.17 Mbits/sec throughput:

The following three tests represent low-end hardware. In the below test runs, there were 750 clients with a throughput of 84.17 Mbits/sec and a traffic split of 562 Teredo and 188 IPHTTPS. Teredo MTU was set to 1472, and Teredo Shunt was enabled. CPU utilization averaged 46.42% across the three tests, and average memory utilization, expressed as a percentage of committed bytes of the total available memory of 4Gb, was 25.95%.

ScenarioCPUAvg (from counter)Mbit/s (Corp Side)Mbit/s (internet Side)Active QMSAActive MMSAMem Utilization (4 Gig system)
Low-end HW. 562 Teredo clients. 188 IPHTTPS clients.47.747254284.3119.131502.051502.126.27%
Low-end HW. 562 Teredo clients. 188 IPHTTPS clients.46.388977884.146118.731501.251501.225.90%
Low-end HW. 562 Teredo clients. 188 IPHTTPS clients.45.11308284.0494118.431546.141546.125.68%

1000 clients, 70/30 split, 78 Mbits/sec throughput:

The following three tests represent performance on low-end hardware. In the test runs below, there were 1000 clients with an average throughput of ~78.64 Mbits/sec and a traffic split of 700 Teredo and 300 IPHTTPS. Teredo MTU was set to 1472 and Teredo Shunt was enabled. CPU utilization averaged ~50.7%, and average memory utilization, expressed as a percentage of committed bytes of the total available memory of 4Gb, was ~28.7%.

ScenarioCPUAvg (from counter)Mbit/s (Corp Side)Mbit/s (internet Side)Active QMSAActive MMSAMem Utilization (4 Gig system)
Low-end HW. 700 Teredo clients. 300 IPHTTPS clients.51.2840624778.6432113.192002.421502.125.59%
Low-end HW. 700 Teredo clients. 300 IPHTTPS clients.51.0699312878.6402113.222001.41501.230.87%
Low-end HW. 700 Teredo clients. 300 IPHTTPS clients.49.7523561778.6387113.22002.61546.130.66%

1000 clients, 70/30 split, 109 Mbits/sec throughput:

In the following three test runs there were 1000 clients with an average throughput of ~109.2 Mbits/sec and a traffic split of 700 Teredo and 300 IPHTTPS. Teredo MTU was set to 1472 and Teredo Shunt was enabled. CPU utilization averaged ~59.06% across the three tests, and average memory utilization, expressed as a percentage of committed bytes of the total available memory of 4Gb, was ~27.34%.

ScenarioCPUAvg (from counter)Mbit/s (Corp Side)Mbit/s (internet Side)Active QMSAActive MMSAMem Utilization (4 Gig system)
Low-end HW. 700 Teredo clients. 300 IPHTTPS clients.59.81640675108.305153.142001.642001.624.38%
Low-end HW. 700 Teredo clients. 300 IPHTTPS clients.59.46473798110.969157.532005.222005.228.72%
Low-end HW. 700 Teredo clients. 300 IPHTTPS clients.57.89089768108.305153.141999.532018.324.38%

Testing results for high-end hardware:

Testing was performed with 1500 clients. Traffic split was 70% Teredo and 30% IPHTTPS. All tests involved TCP traffic over Nat64 using 2 IPsec tunnels per client. In all tests, memory utilization was light and CPU utilization was acceptable.

Individual Test Results:

The following sections describe individual tests. Each section title highlights the key elements of each test followed by a summary description of the results and then a chart containing the detailed results data.

1500 clients, 70/30 split, 153.2 Mbits/sec throughput

The following five tests represent high-end hardware. In the below test runs there were 1500 clients with an average throughput of 153.2 Mbits/sec and a traffic split of 1050 Teredo and 450 IPHTTPS. CPU utilization averaged 50.68% across the five tests, and average memory utilization, expressed as a percentage of committed bytes of the total available memory of 8Gb, was 22.25%.

ScenarioCPUAvg (from counter)Mbit/s (Corp Side)Mbit/s (internet Side)Active QMSAActive MMSAMem Utilization (4 Gig system)
High-end HW. 1050 Teredo clients. 450 IPHTTPS clients.51.712437157.029216.293000.31304621.58%
High-end HW. 1050 Teredo clients. 450 IPHTTPS clients.48.86020205151.012206.533002.863045.321.15%
High-end HW. 1050 Teredo clients. 450 IPHTTPS clients.52.23979519155.511213.453001.153002.922.90%
High-end HW. 1050 Teredo clients. 450 IPHTTPS clients.51.26269767155.09212.923000.743002.422.91%
High-end HW. 1050 Teredo clients. 450 IPHTTPS clients.50.15751307154.772211.923000.93002.122.93%
High-end HW. 1050 Teredo clients. 450 IPHTTPS clients.49.83665607145.994201.923000.51300622.03%

High end hardware test results

© 2017 Microsoft