All message passing operations and file access operations we have
discussed in this course so far are blocking. This means that
when, e.g., you issue a call to MPI_Send, the call
returns only after all the data in the send
buffer has been sent, meaning that it is now safe to perform
other operations on the send buffer, e.g., you may write to it
again. Similarly when you issue
a call to MPI_Recv, the call returns only
after all the data that you expect to receive in the receive
buffer has been written on the buffer, meaning that it is now
safe to perform other operations on the receive buffer, e.g.,
it is safe to read it.
The file IO semantics are similar. The blocking IO operations
such as MPI_File_write and MPI_File_read do
not return until all the data has been taken out of the
write buffer, or all the data has been written onto the
read buffer, so that it is now safe to use or re-use either one or the
other.
Blocking of these functions calls is local, i.e., they block only for as long as the send or write or receive or read buffers are in use by the communication functions. There is another type of blocking, which is markedly more severe. It is called synchronous or global blocking. If you send a message with MPI_Ssend, the function will return only after a matching receive has been activated on the other side, and the receive process has started reading the data into its receive buffer.
On the other hand we also have totally non-blocking operations such as MPI_Isend and MPI_Irecv, which merely initiate the send or the receive and return right away, even as their send or receive buffers are still being used by data transfer operations. Of course, while the transfer is under way, you must not touch the buffers.
What do we need such non-blocking operations for? The reason for their existence is that message passing and file access operations are extremely slow by computing standards. In the time it take to read data from a file, or to send data to other processes, you may be able to perform thousands, even millions of arithmetic operations. So if every nano-second counts, you want to be able to do just this: compute, while data transfer operations execute in the background.
But how are you going to know that a particular data transfer operation you have initiated has completed?
All non-blocking MPI functions take an additional argument of
type MPI_Request .
It is a yet another opaque MPI data
type. Once you have initiated a data transfer you get this
request back and you can then call
MPI_Test,
which takes your request as an argument and checks whether the
corresponding communication operation has completed. The synopsis of
MPI_Test is:
int MPI_Test(MPI_Request *request, int *completed, MPI_Status *status)The value of
completed is set to TRUE when the communication
operation pointed to by request has indeed completed. Otherwise
completed is FALSE. Additionally the status variable
may be inspected for other details pertaining to the operation, e.g., the rank number of
a sender process or the number of items received.
If you have finished all you wanted to do while the data is still being transferred you can instead issue the call to MPI_Wait, the synopsis of which is
int MPI_Wait(MPI_Request *request, MPI_Status *status)There is no
completed in MPI_Wait. This function returns only
after the operation pointed to by the request has completed.
MPI-IO supports similar non-blocking functions for writing on and reading from files. The functions are MPI_File_iwrite and MPI_File_iread . Their respective synopses are:
int MPI_File_iwrite(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Request *request) int MPI_File_iread(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Request *request)Observe that unlike
MPI_File_read, function MPI_File_iread does not take
status as an argument. You have to call MPI_Wait or MPI_Status to
get hold of status in this case. The reason for this should be obvious. When
MPI_File_iread returns, there is no status yet to read. It will only come to existence
after the reading operation has completed.
There are no asynchronous versions of collective file access operations
like MPI_File_read_all and MPI_File_write_all.
You can actually express MPI_File_write and MPI_File_read as combinations
of MPI_File_iwrite and MPI_File_iread with MPI_Wait. The following:
MPI_File_iwrite(fh, buf, count, datatype, &request); MPI_Wait(&request, &status);is equivalent to
MPI_File_write(fh, buf, count, datatype, &status);and
MPI_File_iread(fh, buf, count, datatype, &request); MPI_Wait(&request, &status);is equivalent to
MPI_File_read(fh, buf, count, datatype, &status);