next up previous index
Next: Fortran Interface Up: Manipulating Communicators Previous: The Code

The Discussion

Let us now have a closer look at the code and see what happens there.

The code begins with the usual incantations: MPI_Init, MPI_Comm_size, MPI_Comm_rank. But this time we have two special processes. Process with rank 0, which, as usual, is responsible for I/O, and then process rank numprocs - 1, which we're going to call a server.

  MPI_Init(&argc, &argv);
  world = MPI_COMM_WORLD;
  MPI_Comm_size(world, &numprocs);
  MPI_Comm_rank(world, &myid);
  server = numprocs - 1;

Process rank 0 inspects the command line in order to find the value of $\epsilon$, which is going to be compared to $\pi - \pi'$, where $\pi'$ is going to be the approximate value of $\pi$ as evaluated by our Monte-Carlo computation. The value is then broadcast to all other processes:

  if (myid == 0)
    sscanf( argv[1], "%lf", &epsilon);
  MPI_Bcast(&epsilon, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Now we commence the formation of the new communicator:

  MPI_Comm_group( world, &world_group);
  ranks[0] = server;
  MPI_Group_excl(world_group, 1, ranks, &worker_group);
  MPI_Comm_create(world, worker_group, &workers);
  MPI_Group_free(&worker_group);
In order to construct the new communicator we must extract a group of processes associated with the original communicator, which is referred to here as world. Then we can perform an operation on that group. For example we can exclude some processes from it, or we could add some processes to it using MPI_Group_incl, or if we had two groups we could find their intersection using MPI_Group_intersection, or we could form a union of such groups using MPI_Group_union. Once we have formed a new group of processes we can associate a new communicator with them by calling MPI_Comm_create. Once we have created that new communicator, we no longer need to keep the group that the communicator is associated with around, so we dispose of it by calling MPI_Group_free.

The synopsis for MPI_Comm_group is as follows: the first argument is a communicator, the second argument is a pointer to a container that is going to hold a group of processes associated with the communicator once the function returns.

The synopsis for MPI_Group_excl: the first argument is a group from which processes are to be excluded, then we have to specify how many processes are to be excluded, and then we have to pass an array, whose entries correspond to the rank number of the excluded processes. The last argument is a pointer to a container that will hold the new group once the function returns.

The synopsis for MPI_Comm_create: here the first argument is the original communicator, the second argument is a group of processes within which the new communicator is to be associated, and the third argument is a pointer to a container that will hold the new communicator once the function returns.

Having excluded the random number server from the pool we can subdivide our program into a server part and a worker part.

The server part is simple:

  if(myid == server) {
    do {
      MPI_Recv(&request, 1, MPI_INT, MPI_ANY_SOURCE, REQUEST, world, &stat);
      if (request) {
        for (i = 0; i < CHUNKSIZE; i++)
          rands[i] = random();
        MPI_Send(rands, CHUNKSIZE, MPI_INT, stat.MPI_SOURCE, REPLY, world);
      }
    }
    while (request> 0);
  }
The server waits for a request, which is going to be a single integer number. The tag of this message is going to be REQUEST, and the message is expected to arrive from MPI_ANY_SOURCE within the world communicator. If the integer number sent in request is greater than zero then the server generates a CHUNKSIZE of random integer numbers between 0 and max, which is equal to INT_MAX, and the latter is a system constant defined in /usr/include/sys/limits.h, which in turn is included by /usr/include/stdio.h. The array rands filled with CHUNKSIZE of random integers is then sent back to process stat.MPI_SOURCE, which is the same process that submitted the original request.

If the value of request is less than 1, then the server process terminates.

Now let us have a look at what the worker processes are going to do at the same time.

Their life begins with some initializations that by now should be pretty obvious:

    request = 1;
    done = in = out = 0;
    max = INT_MAX;
And then each worker sends a request to the random number server and finds about its own rank within the workers communicator. No receive is attempted at this stage:
    MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
    MPI_Comm_rank(workers, &workerid);

Now the workers enter the main loop:

    iter = 0;
    while(!done) {
      iter++;

      ...

      error = fabs( Pi-3.141592653589793238462643);
      done = ((error < epsilon) || ((totalin+totalout) > 1000000));
      request = (done) ? 0 : 1;
      if (myid == 0) {
        printf( "\rpi = %23.20lf", Pi);
        MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
      }
      else {
        if (request)
          MPI_Send(&request, 1, MPI_INT, server, REQUEST, world);
      }
    }
Within this loop some computation and exchange of information is done. We'll focus on that part later. After every process has some idea about what its value of $\pi'$ is, it is compared to the exact value of $\pi$ and the absolute value of the difference is written on error. The we check if that error is less than $\varepsilon$ or if we have exceeded a maximum number of points, for all processes taken together. If any of this conditions is satisfied, even if $\pi'$ has not been evaluated with a sufficient accuracy yet, we flag the job as done, and the request is set to 0 or to 1 otherwise. It befalls to the speaker process, i.e., to the guy whose rank number within the world communicator is 0, to send that terminating message to the server. But the speaker process sends some message always: it is simply a request. The other processes, i.e., the workers who are not the speaker send a request only if it's kosher.

Now let us have a look at the computation itself.

We begin by receiving a sequence of random integers from the server. The package is of length CHUNKSIZE, the data items are all of type MPI_INT, the sender is server and the tag is REPLY. The communication takes place within the world communicator, because the server does not belong to the workers communicator:

      MPI_Recv(rands, CHUNKSIZE, MPI_INT, server, REPLY, world, &stat);

Now we convert the whole sequence to points (x,y), check which of those are within the circle of radius 1 ( x2 + y2 < 1) and which are without and increment appropriate counters (in and out).

      for (i=0; i < CHUNKSIZE; ) {
        x = (((double) rands[i++])/max) * 2 - 1;
        y = (((double) rands[i++])/max) * 2 - 1;
        if (x*x + y*y < 1.0)
          in++;
        else
          out++;
      }
Why is rands[i++]/max multiplied by 2 and then 1 subtacted from the result? Function random returns pseudo-random numbers between 0 and 231 - 1. INT_MAX is 231 - 1. So, the largest x or y is going to be 1 and the smallest is going to be -1.

The next step is to exchange our own local in and out with all other worker processes. This is done by the two operations:

      MPI_Allreduce(&in, &totalin, 1, MPI_INT, MPI_SUM, workers);
      MPI_Allreduce(&out, &totalout, 1, MPI_INT, MPI_SUM, workers);
These are not ordinary reduce operations. They are all-reduce operations. The work like a normal reduce, but instead of the result being known only to the root process, here it is known to all processes. So every worker process now ends up with a sum of all ins in its totalin and the sum of all outs in its totalout. This communication takes place only within the workers communicator, so that the random number server process is not bothered.

Finally $\pi$ can be evaluated thusly:

      Pi = (4.0*totalin)/(totalin + totalout);

This is not the end of what the workers processes have to do. After they have finished all their work, they must release the communicator. This is done by calling:

    MPI_Comm_free(&workers);

If you look at this program in ``Using MPI'' by Gropp, Lusk, and Skjellum, you'll find that MPI_Comm_free(&workers) is called just ahead of MPI_Finalize and outside of the

  if(myid == server) {
     ...
  else {
     ...
  }
clause. This is a bug, because this means that the server process which does not belong to the workers communicator will call MPI_Comm_free(&workers) too. But for that process workers is going to evaluate to MPI_COMM_NULL, so the call will really be MPI_Comm_free(&MPI_COMM_NULL). But MPI_COMM_NULL is just a literal constant, and it doesn't have any address, so the whole hell is gonna break loose.

The last step before quitting the program is to write a message about the total number of points processed and generated by the program. This is done by the speaker process:

  if (myid == 0) 
    printf( "\npoints: %d\nin: %d, out: %d\n", 
            totalin+totalout, totalin, totalout);


next up previous index
Next: Fortran Interface Up: Manipulating Communicators Previous: The Code
Zdzislaw Meglicki
2001-02-26