Export (0) Print
Expand All

Walkthrough: Launching the MPI Cluster Debugger in Visual Studio 2008

Updated: February 3, 2010

This walkthrough describes how to configure and launch an MPI Cluster Debugger session on your local computer and on a Microsoft Windows HPC Server 2008 cluster. This walkthrough includes the steps and the sample code that you need to create an application that uses Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) application programming interfaces (APIs).

Before following the steps in this guide, ensure that you have the required software, software updates, and system configurations as listed in Requirements for Using the MPI Cluster Debugger.

This guide includes the following sections:

Create a C++ MPI sample project in Visual Studio 2008

The sample code in this section is for a parallel application that approximates the value of pi by using a Monte Carlo simulation.

The sample code runs through 50,000,000 iterations on each MPI process. In each iteration, the sample code generates random numbers in the interval [0,1] to determine a set of x and y coordinates. The coordinate set is evaluated to determine if the point falls under the line x2 + y2 = 1. If the point falls under the line, the variable count is increased by one. The value of count from each MPI process is summed into the variable result. The total number of points that fell under the line (result) is multiplied by four then divided by the total number of iterations to approximate the value of pi.

The following procedure includes two implementations of the Monte Carlo simulation.

To create the sample project

  1. Run Visual Studio 2008.

  2. Create a new C++ Win32 Console application named ParallelPI. Use a project without precompiled headers.

    1. On the File menu, point to New, and then click Project.

    2. In the New Project dialog box, in Project types, select Visual C++. (Depending on how you set up Visual Studio, Visual C++ may be under Other Project Types.)

    3. In Templates, click Win32 Console Application.

    4. For the project name, type: ParallelPI.

    5. Click OK. This opens the Win32 Console Application Wizard.

    6. Click Next.

    7. In Application Settings, under Additional options, clear the Precompiled header check box.

    8. Click Finish to close the wizard and create the project.

  3. Specify additional properties for the project.

    1. In Solution Explorer, right-click Parallel PI, then click Properties. This opens the Property Pages dialog box.

    2. Expand Configuration Properties, expand C/C++, and then select General.

      In Additional Include Directories, specify the location of the MS MPI C header files. For example:

      C:\Program Files\Microsoft HPC Pack 2008 SDK\Include;
      
    3. In Configuration Properties, expand Linker, and then select General.

      In Additional Library Directories, specify the location of the Microsoft HPC Pack 2008 SDK library file.

      For example, if you want to build and debug a 32-bit application:

      C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386;
      
      If you want to build and debug a 64-bit application:

      C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\amd64;
      
    4. Under Linker, select Input.

      In Additional Dependencies, place the cursor at the beginning of the list that appears in the text box, and then type the following:

      msmpi.lib

    5. If you are using the code sample with OpenMP:

      In Configuration Properties, expand C/C++, and then select Language.

      In Open MP Support, select Yes (/openmp) to enable compiler support for OpenMP.

    6. Click OK to save your settings and close the property pages.

  4. In the main source file, select all the code and then delete it.

  5. Paste one of the following code samples into the empty source file. The first sample uses MPI and OpenMP, and the second sample uses MPI and Parallel Patterns Library (PPL).

    The following code sample uses MPI and OpenMP. The function ThrowDarts uses an OpenMP parallel for loop to utilize the multicore hardware if available.

    // ParallelPI.cpp : Defines the entry point for the MPI application.
    //
    #include "mpi.h"
    #include "stdio.h"
    #include "stdlib.h"
    #include "limits.h"
    #include "omp.h"
    #include <random>
    
    int ThrowDarts(int iterations)
    {
    std::tr1::uniform_real<double> MyRandom;
    std::tr1::minstd_rand0 MyEngine;
    
    
    double RandMax = MyRandom.max();
    int count = 0;
    omp_lock_t MyOmpLock;
    
    omp_init_lock(&MyOmpLock);
    //Compute approximation of pi on each node
    #pragma omp parallel for
    for(int i = 0; i < iterations; ++i)
    {
    double x, y;
    x = MyRandom(MyEngine)/RandMax;
    y = MyRandom(MyEngine)/RandMax;
      
    if(x*x + y*y < 1.0)
    {
    omp_set_lock(&MyOmpLock);
    count++;
    omp_unset_lock(&MyOmpLock);
    }
    }
    
    omp_destroy_lock(&MyOmpLock);
    
    return count;
    }
    
    int main(int argc, char* argv[])
    {
    int rank;
    int size;
    int iterations;
    int count;
    int result;
    double time;
    MPI_Status s;
    
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&size);
    
    if(rank == 0)
    {
    //Rank 0 asks the number of iterations from the user.
    iterations = 50000000;
    if(argc > 1)
    {
    iterations = atoi(argv[1]);
    }
    printf("Executing %d iterations.\n", iterations);
    fflush(stdout);
    }
    //Broadcast the number of iterations to execute.
    if(rank == 0)
    {
    for(int i = 1; i < size; ++i)
    {
    MPI_Ssend(&iterations, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
    }
    }
    else
    {
    MPI_Recv(&iterations, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s);
    }
    
    //MPI_Bcast(&iterations, 1, MPI_INT, 0, MPI_COMM_WORLD);
    
    count = ThrowDarts(iterations);
    
    //Gather and sum results
    if(rank != 0)
    {
    MPI_Ssend(&count, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    }
    else
    {
    for(int i = 1; i < size; ++i)
    {
    int TempCount = 0;
    MPI_Recv(&TempCount, 1, MPI_INT, i, 0, MPI_COMM_WORLD, &s);
    count += TempCount;
    }
    }
    result = count;
    
    //MPI_Reduce(&count, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    
    if(rank == 0)
    {
    printf("The value of PI is approximated to be: %16f", 4*((float)result/(float)(iterations*size)));
    }
        
    MPI_Barrier(MPI_COMM_WORLD);
    
    MPI_Finalize();
    return 0;
    }
    
    

     

    The following code sample uses Parallel Patterns Library (PPL) instead of OpenMP, and it uses the MPI collective operations instead of point-to-point operations.

     

    // ParallelPI.cpp : Defines the entry point for the MPI application.
    //
    #include "mpi.h"
    #include "stdio.h"
    #include "stdlib.h"
    #include "limits.h"
    #include <ppl.h>
    #include <random>
    #include <time.h>
    
    using namespace Concurrency;
    
    int ThrowDarts(int iterations)
    {
    
    combinable<int> count;
    
    int result = 0;
    
    
    parallel_for(0, iterations, [&](int i){
    
    std::tr1::uniform_real<double> MyRandom;
    double RandMax = MyRandom.max();
    std::tr1::minstd_rand0 MyEngine;
    double x, y;
    
    MyEngine.seed((unsigned int)time(NULL));
    
    x = MyRandom(MyEngine)/RandMax;
    y = MyRandom(MyEngine)/RandMax;
      
    if(x*x + y*y < 1.0)
    {
    count.local() += 1;
    }
    });
    
    result = count.combine([](int left, int right) { return left + right; });
    
    return result;
    }
    
    void main(int argc, char* argv[])
    {
    int rank;
    int size;
    int iterations;
    int count;
    int result;
    
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&size);
    
    if(rank == 0)
    {
    //Rank 0 reads the number of iterations from the command line.
    //50M iterations is the default.
    iterations = 50000000;
    if(argc > 1)
    {
    iterations = atoi(argv[argc-1]);
    }
    printf("Executing %d iterations on %d nodes.\n", iterations, size);
    fflush(stdout);
    }
    //Broadcast the number of iterations to execute.
    MPI_Bcast(&iterations, 1, MPI_INT, 0, MPI_COMM_WORLD);
    
    count = ThrowDarts(iterations);
    
    //Gather and sum results
    MPI_Reduce(&count, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    
    if(rank == 0)
    {
    printf("The value of PI is approximated to be: %16f", 4*((double)result/(double)(iterations*size)));
    }
        
    MPI_Barrier(MPI_COMM_WORLD);
    
    MPI_Finalize();
    
    }
    
    
  6. On the File menu, click Save All.

  7. On the Build menu, click Build ParallelPI.

Configure and launch the MPI Cluster Debugger

After building your application, you are ready to configure and launch the debugger. This section describes three options for debugging:

ImportantImportant
MPI programs communicate through IP over ports. The first time you launch an MPI program, you may see a security warning from the firewall that indicates a port is being opened. Read the warning message and ensure that you understand the changes that you are making to your system. You must unblock the firewall to continue debugging on the local computer.

noteNote
In the MPI Cluster Debugger, you cannot start without debugging. Pressing Ctrl+F5 (or Start without debugging on the Debug menu) also starts debugging.

Debug one MPI process on the local computer

To debug on the local computer by using only one MPI process, use the same process that you would use to debug any other application. Set a break point at the desired location in your program and press F5 to start the debugger.

Debug multiple MPI processes on the local computer

The following procedure describes how to start a local debugging session for ParallelPI. ParallelPI accepts one argument that determines the number of iterations to run. The default is set to 50,000,000. The following procedure includes a step to reduce the iterations to 5,000.

To start the MPI Cluster Debugger with four MPI processes running on your local computer

  1. In Solution Explorer, right-click Parallel PI, and then click Properties. This opens the Property Pages dialog box.

  2. Expand Configuration Properties, and then select Debugging.

  3. Under Debugger to launch, select MPI Cluster Debugger.

  4. To reduce the iterations to 5,000: In Application Arguments, type 5000.

  5. Click OK to save the changes and close the Property Pages.

  6. On the Tools menu, click Cluster Debugger Configuration. This opens the Cluster Debugger Configuration pane.

  7. In Cluster Debugger Configuration, specify the following properties:

    • In Cluster head node, select localhost.

    • In Number of processes, type 4.

  8. Set a breakpoint within the body of the parallel for loop.

  9. Press F5 to launch the debugger.

  10. Five console windows appear: one cmd.exe window, and four ParallelPI.exe windows (one for each process that you launched). The console window that corresponds to the rank 0 process indicates the number of iterations and the calculated approximation of pi.

  11. On the Debug menu, click Windows, and then click Processes.

  12. Set the active process for debugging by double-clicking a process in the Processes window.

noteNote
When you are debugging multiple processes, by default, a breakpoint affects all processes that are being debugged. To avoid breaking processes in unintended places, clear the Break all processes when one process breaks option. (In the Tools menu, click Options, then select Debugging). For more information about how to change break behavior, see Execution Control.

Debug one or more MPI processes on a cluster

When you launch the MPI Debugger on a cluster, the debugger submits your application to the cluster as a job. The Visual C runtimes that match your project (x86 or x64, and debug or release) must be present in the working directory on the compute nodes. If the correct runtimes are not already on the compute nodes, you need to include these in the debugger deployment by specifying the Additional Files to Deploy property.

The following procedure includes a step to deploy the OpenMP debug runtime DLL. By default, the C run-time (CRT) library is deployed when you launch the MPI Cluster Debugger. If the correct runtimes are not present, you will see side-by-side errors when you try to run your application. If the OpenMP runtime is not included, the breakpoints will not be hit.

To launch the MPI Debugger on a cluster

  1. In Solution Explorer, right-click Parallel PI, and then click Properties. This opens the Property Pages dialog box.

  2. Expand Configuration Properties, and then select Debugging.

  3. Under Debugger to launch, select MPI Cluster Debugger.

  4. Click OK to save changes and close Property Pages.

  5. On the Tools menu, click Cluster Debugger Configuration. This opens the Cluster Debugger Configuration pane.

  6. In Cluster Debugger Configuration, specify the following properties:

    • In the Cluster head node drop-down list, select the name of the head node for the cluster that you want to use.

      The list of head nodes is populated from the Active Directory domain controller. Only clusters in your domain appear in the list. If you do not see your head node, type the name or the IPv4 address of the head node in the property field.

    • In Number of processes, type 4.

    • Expand Advanced Configurations.

    • In Execution\work directory, specify a local working directory on each compute node. For example, type the following, where <myUserName> is your user name:

      C:\Users\<myUserName>\ParallelPI

  7. If you are using the sample code with OpenMP, add the OpenMP debug runtime DLL file in the Cluster Debugger Configuration properties as follows:

    1. In Advanced Configuration, in Additional Files to Deploy, select <Edit…>. This opens the File and Folder Selector dialog box.

    2. Click Add File, navigate to Microsoft.VC90.DebugOpenMP\vcomp90d.dll, select the file, and then click Open.

      For example, on an x86-based computer, the default location on a 64-bit edition of the Windows Server 2008 operating system is:

      C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\redist\Debug_NonRedist\x86\Microsoft.VC90.DebugOpenMP\vcomp90d.dll

    3. Click OK to add the file and close the File and Folder Selector dialog box.

  8. Set a breakpoint within the body of the parallel for loop.

  9. Press F5 to launch the debugger.

  10. Because you are submitting a job to the cluster, you are prompted to enter your password to connect to the cluster. Type your password, and then press ENTER.

  11. After the debugger launches, look at the process window to verify the placement of the processes. For each process, look at the Transport Qualifier column to view the compute node on which the process is running.

Appendix: Files deployed by Visual Studio in addition to the application binaries (and CRT if requested)

  • DebuggerProxy.dll

  • DebuggerProxy.dll.manifest

  • Delete_from_workdir.bat: A script to delete the files that are deployed

  • Deploy_to_workdir.bat: A script to copy files from the Deployment Directory to the work directory

  • dbghelp.dll

  • mcee.dll

  • Mpishim.bat: A script to launch the remote debugger

  • Mpishim.exe: A program that orchestrates communication between the IDE and Msvsmon.exe

  • Msvsmon.exe: The remote debugger

  • Msvsmon.exe.config

  • PfxTaskProvider.dll

  • symsrv.dll

  • symsrv.yes

  • vbdebug.dll

  • 1033\msdbgui.dll

  • 1033\vbdebugui.dll

See Also

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft