Blue Screen Appears on a Node Running a GPGPU Job

 

Applies To: Windows HPC Server 2008 R2, Windows HPC Server 2008

If a blue screen occurs on a compute node that is executing a long-running general purpose computation job on a graphics processing unit (GPU) computing processor that uses a Windows Display Driver Model (WDDM) driver, you may need to modify or disable the timeout detection and recovery registry setting for the GPU on each compute node.

To disable the timeout detection and recovery registry setting, under HKLM\System\CurrentControlSet\Control\GraphicsDriver, set TdrLevel to 0. For more information, see Timeout Detection and Recovery of GPUs through WDDM (https://go.microsoft.com/fwlink/?LinkId=196045).

Warning

Incorrectly editing the registry may severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.