next up previous index
Next: Exercises Up: MPI IO Previous: Exercises

Asynchronous IO

All message passing operations and file access operations we have discussed in this course so far are blocking. This means that when, e.g., you issue a call to MPI_Send, the call returns only after all the data in the send buffer has been sent, meaning that it is now safe to perform other operations on the send buffer, e.g., you may write to it again. Similarly when you issue a call to MPI_Recv, the call returns only after all the data that you expect to receive in the receive buffer has been written on the buffer, meaning that it is now safe to perform other operations on the receive buffer, e.g., it is safe to read it.

The file IO semantics are similar. The blocking IO operations such as MPI_File_write and MPI_File_read do not return until all the data has been taken out of the write buffer, or all the data has been written onto the read buffer, so that it is now safe to use or re-use either one or the other.

Blocking of these functions calls is local, i.e., they block only for as long as the send or write or receive or read buffers are in use by the communication functions. There is another type of blocking, which is markedly more severe. It is called synchronous or global blocking. If you send a message with  MPI_Ssend, the function will return only after a matching receive has been activated on the other side, and the receive process has started reading the data into its receive buffer.

On the other hand we also have totally non-blocking operations such as  MPI_Isend and  MPI_Irecv, which merely initiate the send or the receive and return right away, even as their send or receive buffers are still being used by data transfer operations. Of course, while the transfer is under way, you must not touch the buffers.

What do we need such non-blocking operations for? The reason for their existence is that message passing and file access operations are extremely slow by computing standards. In the time it take to read data from a file, or to send data to other processes, you may be able to perform thousands, even millions of arithmetic operations. So if every nano-second counts, you want to be able to do just this: compute, while data transfer operations execute in the background.

But how are you going to know that a particular data transfer operation you have initiated has completed?

All non-blocking MPI functions take an additional argument of type MPI_Request . It is a yet another opaque MPI data type. Once you have initiated a data transfer you get this request back and you can then call  MPI_Test, which takes your request as an argument and checks whether the corresponding communication operation has completed. The synopsis of MPI_Test is:

int MPI_Test(MPI_Request *request, int *completed, MPI_Status *status)
The value of completed is set to TRUE when the communication operation pointed to by request has indeed completed. Otherwise completed is FALSE. Additionally the status variable may be inspected for other details pertaining to the operation, e.g., the rank number of a sender process or the number of items received.

If you have finished all you wanted to do while the data is still being transferred you can instead issue the call to  MPI_Wait, the synopsis of which is

int MPI_Wait(MPI_Request *request, MPI_Status *status)
There is no completed in MPI_Wait. This function returns only after the operation pointed to by the request has completed.

MPI-IO supports similar non-blocking functions for writing on and reading from files. The functions are MPI_File_iwrite  and MPI_File_iread . Their respective synopses are:

int MPI_File_iwrite(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Request *request) 
int MPI_File_iread(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Request *request)
Observe that unlike MPI_File_read, function MPI_File_iread does not take status as an argument. You have to call MPI_Wait or MPI_Status to get hold of status in this case. The reason for this should be obvious. When MPI_File_iread returns, there is no status yet to read. It will only come to existence after the reading operation has completed.

There are no asynchronous versions of collective file access operations like MPI_File_read_all and MPI_File_write_all.

You can actually express MPI_File_write and MPI_File_read as combinations of MPI_File_iwrite and MPI_File_iread with MPI_Wait. The following:

MPI_File_iwrite(fh, buf, count, datatype, &request);
MPI_Wait(&request, &status);
is equivalent to
MPI_File_write(fh, buf, count, datatype, &status);
MPI_File_iread(fh, buf, count, datatype, &request);
MPI_Wait(&request, &status);
is equivalent to
MPI_File_read(fh, buf, count, datatype, &status);

next up previous index
Next: Exercises Up: MPI IO Previous: Exercises
Zdzislaw Meglicki