cpi. The source of this program is distributed with
MPICH2, and it lives in the directory
/N/hpc/mpich2/src/mpich2-0.94b1/examples.
The file name is cpi.c.
For your convenience and because it is an open software program, I also quote
it below:
gustav@bh1 $ pwd
/N/hpc/mpich2/src/mpich2-0.94b1/examples
gustav@bh1 $ cat cpi.c
#include "mpi.h"
#include <stdio.h>
#include <math.h>
double f(double);
double f(double a)
{
return (4.0 / (1.0 + a*a));
}
int main(int argc,char *argv[])
{
int done = 0, n, myid, numprocs, i;
double PI25DT = 3.141592653589793238462643;
double mypi, pi, h, sum, x;
double startwtime = 0.0, endwtime;
int namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
MPI_Get_processor_name(processor_name,&namelen);
fprintf(stdout,"Process %d of %d is on %s\n",
myid, numprocs, processor_name);
fflush(stdout);
n = 10000; /* default # of rectangles */
if (myid == 0)
startwtime = MPI_Wtime();
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
h = 1.0 / (double) n;
sum = 0.0;
/* A slightly better approach starts from large i and works back */
for (i = myid + 1; i <= n; i += numprocs)
{
x = h * ((double)i - 0.5);
sum += f(x);
}
mypi = h * sum;
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) {
endwtime = MPI_Wtime();
printf("pi is approximately %.16f, Error is %.16f\n",
pi, fabs(pi - PI25DT));
printf("wall clock time = %f\n", endwtime-startwtime);
fflush(stdout);
}
MPI_Finalize();
return 0;
}
gustav@bh1 $
This program calculates | (5.1) |
The program begins with the usual incantations to
MPI_Init, MPI_Comm_size, MPI_Comm_rank and
MPI_Get_processor_name. Then each process writes on standard output
its rank number (which is called myid here), the size of the
communicator (which is called numprocs here) and the name of the
processor it runs on.
We have not talked about it so far, but this writing on standard output
is far from trivial. Every process writes on its own standard
output. In some early versions of MPI, e.g., in LAM MPI, these writes
were simply lost. But MPICH2 engine collects all standard outputs from
its processes and combines them together on the so called console
node. This is always the node whose rank is zero, the node on which
we run mpdboot. The output you see on the PBS output file is
the output from the MPICH2 console node.
Now we hardwire the number of division of segment [0,1] into
n and the console node begins the timing of the program by
calling function
MPI_Wtime.
This function measures wall-clock time down to a microsecond, sic!
It is very accurate indeed.
Now the value of n is broadcast to all processes. This is completely
unnecessary in this particular program, because all nodes know what n
is from the very beginning, but this program has been modified from its
interactive predecessor, which used to read n from standard input.
It was the console node that used to do the reading.
Now every process calculates the width of the rectangles:
h = 1.0 / (double) n;Every process initializes its own local variable
sum to zero,
and then calls
Now we encounter a new operation,
MPI_Reduce.
This operation takes as its input the content of mypi for each
process in the pool (the first argument). We tell it that there is one item (the third argument)
of type MPI_DOUBLE (the fourth argument) in there. We are then telling it that it should
perform summation over all instances of mypi (the fifth
argument, MPI_SUM) and that it should write the result of the
operation on pi (the second argument) on the root process (the sixth
argument), which is this case is the console process, i.e., process of rank zero.
The whole operation is performed within the MPI_COMM_WORLD communicator.
At the end of this operation only the root (console) process has the right answer in
its location that corresponds to pi. But this is fine. In the next
clause we instruct the console process to measure the wall clock time again,
then print the value of
on standard output together with the estimate
of the accuracy of the computation. Then the console process also writes
how long it took to carry out the computation.
Here is the output of the program.
gustav@bh1 $ pwd /N/B/gustav/PBS gustav@bh1 $ cat mpi_out Local MPD console on bc89 Process 0 of 8 is on bc89 Process 1 of 8 is on bc40 Process 2 of 8 is on bc42 Process 3 of 8 is on bc41 Process 4 of 8 is on bc44 Process 5 of 8 is on bc43 Process 6 of 8 is on bc88 Process 7 of 8 is on bc46 pi is approximately 3.1415926544231247, Error is 0.0000000008333316 wall clock time = 0.009541 gustav@bh1 $
MPI_SUM is not the only operation you can use with
MPI_Reduce. MPI provides a number of predefined reduction operations that
can be used in this context, and you can define your own
operations too. The predefined ones are:
MPI_MAX |
MPI_MIN |
MPI_SUM |
MPI_PROD |
MPI_LAND |
MPI_BAND |
MPI_LOR |
MPI_BOR |
MPI_LXOR |
MPI_BXOR |
MPI_MAXLOC |
MPI_MINLOC |