next up previous index
Next: Not So Simple MPI Up: Bank Queue Previous: The Master Program

The Slave Program

Let use have a look at the slave program now that is enclosed in

else { /* I am not the master */
   blah... blah... blah...
}

The slave processes begin their career by opening a log file on their own local /tmp directory. In this program the file is called simply gustav_log, because I couldn't think of anything else - but in a serious application you may have to put up an effort and generate a unique file name with some generic prefix, perhaps.

In this program all processes write a lot, so that you can see what they've done. If they were to write it all on the same standard output as the master process, the whole information would get garbled and quite useless.

The next step is to receive vector b, which has been sent by the master.

Why do we call MPI_Bcast in this program twice? Well, we don't. It only appears so, because MPI_Bcast is printed twice within the text of the program. But the call to MPI_Bcast issued within the master part would not have been executed by the slaves, so here we have to type the call again, separately, for the slave processes.

After they have received their copies of b, they log this event on their respective log files and wait for the first batch of jobs to be sent to them by the master process.

Having received their first row of A they log it on gustav_log and commence work.

The work is done within the large

      while (status.MPI_TAG != ROWS) { /* The job is not finished */
         blah... blah... blah...
      }
loop. Every time a slave process receives a message from the master process it checks if the tag of the message is less than ROWS. Remember that having received a message with tag ROWS implies the termination of the contract!

If the tag is kosher, the slave process does the following:

         row = status.MPI_TAG; sum = 0;
         for (i = 0; i < COLS; i++) sum = sum + int_buffer[i] * b[i];
         int_buffer[0] = sum;
         MPI_Send (int_buffer, 1, MPI_INT, MASTER_RANK, row, MPI_COMM_WORLD);
         fprintf(log_file, "sent row %d to %d\n", row, MASTER_RANK);
         fflush(log_file);
The row number is extracted from the tag of the message. Then the slave process evaluates $\sum_j A_{ij} b_j$ and sends it back to the master using the same tag. So that the master will know which row number the answer corresponds to.

This operation, again, is logged on /tmp/gustav_log.

Finally, the slave process waits for another message from the master process, reads it, and logs this operation on gustav_log:

         MPI_Recv (int_buffer, COLS, MPI_INT, MASTER_RANK, MPI_ANY_TAG,
                   MPI_COMM_WORLD, &status);
         fprintf(log_file, "received a message from %d, tag %d\n",
                 status.MPI_SOURCE, status.MPI_TAG);
         fflush(log_file);

Then it's back to the top of the loop: check the tag, if the tag is OK perform the computation, otherwise hit MPI_Finalize().

It is instructive to have a look at one of those files generated by slave processes.

When I have run this program on our SP, I got the following in my bank.err file:

INFO: 0031-119  Host sp40.ucs.indiana.edu allocated for task 0
INFO: 0031-119  Host sp22.ucs.indiana.edu allocated for task 1
INFO: 0031-119  Host sp19.ucs.indiana.edu allocated for task 2
INFO: 0031-119  Host sp17.ucs.indiana.edu allocated for task 3
INFO: 0031-119  Host sp42.ucs.indiana.edu allocated for task 4
INFO: 0031-119  Host sp43.ucs.indiana.edu allocated for task 5
INFO: 0031-119  Host sp23.ucs.indiana.edu allocated for task 6
INFO: 0031-119  Host sp41.ucs.indiana.edu allocated for task 7
This tells me that my master ran on node sp40, and the slaves ran on nodes sp17, sp19, sp22, sp23, sp41, sp42, and sp43.

So let's go to, say, sp41, and have a look at what's in /tmp:

gustav@sp41:../SP 19:36:25 !501 $ cd /tmp
gustav@sp41:..//tmp 19:36:26 !502 $ ls
gustav_log                ssh-gustav                startd_unix_dgram_socket
gustav@sp41:..//tmp 19:36:28 !503 $ cat gustav_log
received broadcast from 0
received a message from 0, tag 6
sent row 6 to 0
received a message from 0, tag 44
sent row 44 to 0
received a message from 0, tag 50
sent row 50 to 0
received a message from 0, tag 56
sent row 56 to 0
received a message from 0, tag 62
sent row 62 to 0
received a message from 0, tag 68
sent row 68 to 0
received a message from 0, tag 74
sent row 74 to 0
received a message from 0, tag 81
sent row 81 to 0
received a message from 0, tag 87
sent row 87 to 0
received a message from 0, tag 93
sent row 93 to 0
received a message from 0, tag 99
sent row 99 to 0
received a message from 0, tag 100
exiting on  tag 100
gustav@sp41:..//tmp 19:36:35 !504 $
What we find from this log is that the process running on node sp41 received row number 6 initially, and took a rather long time to return the answer to the master process, and by the time it did that, the other processes have nearly finished half of the matrix. But from that point onwards sp41 worked quite conscientiously on roughly every 6th row.

The beauty of the job queue paradigm is that you keep all processes as busy as they can get, even if some have to cope with more load than others.


next up previous index
Next: Not So Simple MPI Up: Bank Queue Previous: The Master Program
Zdzislaw Meglicki
2001-02-26