Understanding Non-uniform Memory Access
Microsoft SQL Server is non-uniform memory access (NUMA) aware, and performs well on NUMA hardware without special configuration. As clock speed and the number of processors increase, it becomes increasingly difficult to reduce the memory latency required to use this additional processing power. To circumvent this, hardware vendors provide large L3 caches, but this is only a limited solution. NUMA architecture provides a scalable solution to this problem. SQL Server has been designed to take advantage of NUMA-based computers without requiring any application changes.
The trend in hardware has been towards more than one system bus, each serving a small set of processors. Each group of processors has its own memory and possibly its own I/O channels. However, each CPU can access memory associated with the other groups in a coherent way. Each group is called a NUMA node. The number of CPUs within a NUMA node depends on the hardware vendor. It is faster to access local memory than the memory associated with other NUMA nodes. This is the reason for the name, non-uniform memory access architecture.
On NUMA hardware, some regions of memory are on physically different buses from other regions. Because NUMA uses local and foreign memory, it will take longer to access some regions of memory than others. Local memory and foreign memory are typically used in reference to a currently running thread. Local memory is the memory that is on the same node as the CPU currently running the thread. Any memory that does not belong to the node on which the thread is currently running is foreign. Foreign memory is also known as remote memory. The ratio of the cost to access foreign memory over that for local memory is called the NUMA ratio. If the NUMA ratio is 1, it is symmetric multiprocessing (SMP). The greater the ratio, the more it costs to access the memory of other nodes. Windows applications that are not NUMA aware (including SQL Server 2000 SP3 and earlier) sometimes perform poorly on NUMA hardware.
The main benefit of NUMA is scalability. The NUMA architecture was designed to surpass the scalability limits of the SMP architecture. With SMP, all memory access is posted to the same shared memory bus. This works fine for a relatively small number of CPUs, but not when you have dozens, even hundreds, of CPUs competing for access to the shared memory bus. NUMA alleviates these bottlenecks by limiting the number of CPUs on any one memory bus and connecting the various nodes by means of a high speed interconnection.
NUMA can match memory with CPUs through specialized hardware (hardware NUMA) or by configuring SQL Server memory (soft-NUMA). During startup, SQL Server configures itself based on underlying operating system and hardware configuration or the soft-NUMA setting. For both hardware and soft-NUMA, when SQL Server starts in a NUMA configuration, the SQL Server log records a multimode configuration message for each node, along with the CPU mask.
Computers with hardware NUMA have more than one system bus, each serving a small set of processors. Each group of processors has its own memory and possibly its own I/O channels, but each CPU can access memory associated with other groups in a coherent way. Each group is called a NUMA node. The number of CPUs within a NUMA node depends on the hardware vendor. Your hardware manufacturer can tell you if your computer supports hardware NUMA.
If you have hardware NUMA, it may be configured to use interleaved memory instead of NUMA. In that case, Windows and therefore SQL Server will not recognize it as NUMA. Run the following query to find the number of memory nodes available to SQL Server:
SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks
If SQL Server returns only a single memory node (node 0), either you do not have hardware NUMA, or the hardware is configured as interleaved (non-NUMA). If you think your hardware NUMA is configured incorrectly, contact your hardware vendor to enable NUMA. SQL Server ignores NUMA configuration when hardware NUMA has four or less CPUs and at least one node has only one CPU.
SQL Server allows you to group CPUs into nodes referred to as soft-NUMA. You usually configure soft-NUMA when you have many CPUs and do not have hardware NUMA, but you can also use soft-NUMA to subdivide hardware NUMA nodes into smaller groups. Only the SQL Server scheduler and SQL Server Network Interface (SNI) are soft-NUMA aware. Memory nodes are created based on hardware NUMA and therefore not impacted by soft-NUMA. So, for example, if you have an SMP computer with eight CPUs and you create four soft-NUMA nodes with two CPUs each, you will only have one memory node serving all four NUMA nodes. Soft-NUMA does not provide memory to CPU affinity.
The benefits of soft-NUMA include reducing I/O and lazy writer bottlenecks on computers with many CPUs and no hardware NUMA. There is a single I/O thread and a single lazy writer thread for each NUMA node. Depending on the usage of the database, these single threads may be a significant performance bottleneck. Configuring four soft-NUMA nodes provides four I/O threads and four lazy writer threads, which could increase performance.
You cannot create a soft-NUMA that includes CPUs from different hardware NUMA nodes. For example, if your hardware has eight CPUs (0..7) and you have two hardware NUMA nodes (0-3 and 4-7), you can create soft-NUMA by combining CPU(0,1) and CPU(2,3). You cannot create soft-NUMA using CPU (1, 5), but you can use CPU affinity to affinitize an instance of SQL Server to CPUs from different NUMA nodes. So in the previous example, if SQL Server uses CPUs 0-3, you will have one I/O thread and one lazy writer thread. If, in the previous example SQL Server uses CPUs 1, 2, 5, and 6, you will access two NUMA nodes and have two I/O threads and two lazy writer threads.
Some hardware configurations share common resources like an L3/L4 cache. Processors can be grouped around these shared resources to create soft-NUMA nodes.
For more information, see How to: Configure SQL Server to Use Soft-NUMA.