next up previous index
Next: Exercises Up: Basic MPI Previous: Exercises

Dividing the Pie

We are now going to analyze the program you should have run already, cpi. The source of this program is distributed with MPICH2, and it lives in the directory /N/hpc/mpich2/src/mpich2-0.94b1/examples. The file name is cpi.c. For your convenience and because it is an open software program, I also quote it below:
gustav@bh1 $ pwd
gustav@bh1 $ cat cpi.c
#include "mpi.h"
#include <stdio.h>
#include <math.h>

double f(double);

double f(double a)
    return (4.0 / (1.0 + a*a));

int main(int argc,char *argv[])
    int done = 0, n, myid, numprocs, i;
    double PI25DT = 3.141592653589793238462643;
    double mypi, pi, h, sum, x;
    double startwtime = 0.0, endwtime;
    int  namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];


    fprintf(stdout,"Process %d of %d is on %s\n",
            myid, numprocs, processor_name);

    n = 10000;                  /* default # of rectangles */
    if (myid == 0)
        startwtime = MPI_Wtime();

    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

    h   = 1.0 / (double) n;
    sum = 0.0;
    /* A slightly better approach starts from large i and works back */
    for (i = myid + 1; i <= n; i += numprocs)
        x = h * ((double)i - 0.5);
        sum += f(x);
    mypi = h * sum;

    MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

    if (myid == 0) {
        endwtime = MPI_Wtime();
        printf("pi is approximately %.16f, Error is %.16f\n",
               pi, fabs(pi - PI25DT));
        printf("wall clock time = %f\n", endwtime-startwtime);         

    return 0;
gustav@bh1 $
This program calculates $\pi$ by using the formula

\begin{displaymath}\pi = 4 \int^1_0 \frac{1}{1 + x^2}\,\mathrm{d}x
\end{displaymath} (5.1)

The calculation is numerical, i.e., the segment [0,1] is divided into 10,000 equal fragments. Within each fragment we evaluate $\frac{4}{1 + x^2}$ for the middle of the fragment (this is the height of the rectangle) and multiply it by the width of the fragment to obtain the area of the rectangle. Then we add all the areas together and this is our integral, that is also the approximate value of $\pi$.

The program begins with the usual incantations to MPI_Init, MPI_Comm_size, MPI_Comm_rank and MPI_Get_processor_name. Then each process writes on standard output its rank number (which is called myid here), the size of the communicator (which is called numprocs here) and the name of the processor it runs on.

We have not talked about it so far, but this writing on standard output is far from trivial. Every process writes on its own standard output. In some early versions of MPI, e.g., in LAM MPI, these writes were simply lost. But MPICH2 engine collects all standard outputs from its processes and combines them together on the so called console node. This is always the node whose rank is zero, the node on which we run mpdboot. The output you see on the PBS output file is the output from the MPICH2 console node.
Now we hardwire the number of division of segment [0,1] into n and the console node begins the timing of the program by calling function  MPI_Wtime. This function measures wall-clock time down to a microsecond, sic! It is very accurate indeed.

Now the value of n is broadcast to all processes. This is completely unnecessary in this particular program, because all nodes know what n is from the very beginning, but this program has been modified from its interactive predecessor, which used to read n from standard input. It was the console node that used to do the reading.

Now every process calculates the width of the rectangles:

h = 1.0 / (double) n;
Every process initializes its own local variable sum to zero, and then calls $f(x) = \frac{4}{1 + x^2}$ on$\ldots$ the rectangle that corresponds to its own rank number. Then it jumps to the rectangle that corresponds to its rank number plus the number of processes in the pool, and so on. The values of $f(x) = \frac{4}{1 + x^2}$ are added for all rectangles and only at the end they are multiplied by the width of the rectangle. This, of course, saves on unnecessary multiplications.

Now we encounter a new  operation, MPI_Reduce. This operation takes as its input the content of mypi for each process in the pool (the first argument). We tell it that there is one item (the third argument) of type MPI_DOUBLE  (the fourth argument) in there. We are then telling it that it should perform summation over all instances of mypi (the fifth  argument, MPI_SUM) and that it should write the result of the operation on pi (the second argument) on the root process (the sixth argument), which is this case is the console process, i.e., process of rank zero. The whole operation is performed within the MPI_COMM_WORLD communicator.

At the end of this operation only the root (console) process has the right answer in its location that corresponds to pi. But this is fine. In the next clause we instruct the console process to measure the wall clock time again, then print the value of $\pi$ on standard output together with the estimate of the accuracy of the computation. Then the console process also writes how long it took to carry out the computation.

Here is the output of the program.

gustav@bh1 $ pwd 
gustav@bh1 $ cat mpi_out
Local MPD console on bc89
Process 0 of 8 is on bc89
Process 1 of 8 is on bc40
Process 2 of 8 is on bc42
Process 3 of 8 is on bc41
Process 4 of 8 is on bc44
Process 5 of 8 is on bc43
Process 6 of 8 is on bc88
Process 7 of 8 is on bc46
pi is approximately 3.1415926544231247, Error is 0.0000000008333316
wall clock time = 0.009541
gustav@bh1 $

MPI_SUM is not the only operation you can use with MPI_Reduce. MPI provides a number of predefined reduction operations that can be used in this context, and you can define your own operations too. The predefined ones are:


next up previous index
Next: Exercises Up: Basic MPI Previous: Exercises
Zdzislaw Meglicki