This program begins the same way mkrandpfile did, until
we get to
file_open_error = MPI_File_open(MPI_COMM_WORLD, filename,
MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
if (file_open_error != MPI_SUCCESS) {
char error_string[BUFSIZ];
int length_of_error_string, error_class;
MPI_Error_class(file_open_error, &error_class);
MPI_Error_string(error_class, error_string, &length_of_error_string);
printf("%3d: %s\n", my_rank, error_string);
MPI_Error_string(file_open_error, error_string, &length_of_error_string);
printf("%3d: %s\n", my_rank, error_string);
MPI_Abort(MPI_COMM_WORLD, file_open_error);
}
This time we open the file for reading only and we check what function
MPI_File_open has returned. If there is no problem, i.e.,
file_open_error == MPI_SUCCESS, then we go ahead and read the
file. But if there is a problem, we convert file_open_error
to error messages, print them on standard output and MPI_Abort.
Assuming that the MPI_File_open worked, we need to find out
how much data has to be read. So we check the size of the file
by calling
MPI_File_get_size
MPI_File_get_size(fh, &total_number_of_bytes);where
total_number_of_bytes must be of type MPI_Offset,
i.e., in our case, long long.
Now we evaluate how much data every process needs to read:
number_of_bytes_ll = total_number_of_bytes / pool_size;
/* If pool_size does not divide total_number_of_bytes evenly,
the last process will have to read more data, i.e., to the
end of the file. */
max_number_of_bytes_ll =
number_of_bytes_ll + total_number_of_bytes % pool_size;
Depending on the length of the file and the number of processes,
the division of the former by the latter may or may not be exact.
If it isn't then max_number_of_bytes_ll is going to
be a little larger than number_of_bytes_ll. We will make
the last process read more. Observe that both
number_of_bytes_ll and max_number_of_bytes_ll are
long long. At this stage we don't know if they'll fit
in int.
Now we have the if statement:
if (max_number_of_bytes_ll < INT_MAX) {
blah... blah... blah...
}
else {
if (i_am_the_master) {
printf("Not enough memory to read the file.\n");
printf("Consider running on more nodes.\n");
}
} /* of if(max_number_of_bytes_ll < INT_MAX) */
MPI_File_close(&fh);
This statement checks, right at the top, if max_number_of_bytes_ll
is going to fit into int, because we are going to read the
data the same way we wrote it, i.e., in one large gasp into a single
sufficiently long array. If max_number_of_bytes_ll is too large,
then we close the file right away.
Now let's see what happens inside the top clause of the if
statement.
First each process converts number_of_bytes_ll to a normal
integer suitable for passing to malloc with the exception
of the last process, which does it to max_number_of_bytes_ll,
and then they all call malloc:
if (my_rank == last_guy)
number_of_bytes = (int) max_number_of_bytes_ll;
else
number_of_bytes = (int) number_of_bytes_ll;
read_buffer = (char*) malloc(number_of_bytes);
Now every process figures out its own offset in the file and goes there:
my_offset = (MPI_Offset) my_rank * number_of_bytes_ll;
#ifdef DEBUG
printf("%3d: my offset = %lld\n", my_rank, my_offset);
#endif
MPI_File_seek(fh, my_offset, MPI_SEEK_SET);
MPI_Barrier(MPI_COMM_WORLD);
and then they all meet at the barrier.
Now we are ready to commence the read, to time it, and to find if and how the pointers have advanced as the result of it:
start = MPI_Wtime();
MPI_File_read(fh, read_buffer, number_of_bytes, MPI_BYTE, &status);
finish = MPI_Wtime();
MPI_Get_count(&status, MPI_BYTE, &count);
#ifdef DEBUG
printf("%3d: read %d bytes\n", my_rank, count);
#endif
MPI_File_get_position(fh, &my_offset);
#ifdef DEBUG
printf("%3d: my offset = %lld\n", my_rank, my_offset);
#endif
Function
MPI_File_read
read number_of_bytes of items of type MPI_BYTE into
the read_buffer from the file given by the file handle
fh. Every process reads the data beginning from the position
it is at as the result of the call to MPI_File_seek, and as
the reading progresses, its own pointer moves accordingly.
Let us have a look at the positions of the pointers before and after the reading:
0: total_number_of_bytes = 34359738368 0: allocated 1073741824 bytes 0: my offset = 0 1: total_number_of_bytes = 34359738368 1: allocated 1073741824 bytes 1: my offset = 1073741824 2: total_number_of_bytes = 34359738368 2: allocated 1073741824 bytes 2: my offset = 2147483648 3: total_number_of_bytes = 34359738368 3: allocated 1073741824 bytes 3: my offset = 3221225472 4: total_number_of_bytes = 34359738368 4: allocated 1073741824 bytes 4: my offset = 4294967296 [...] 0: read 1073741824 bytes 0: my offset = 1073741824 1: read 1073741824 bytes 1: my offset = 2147483648 2: read 1073741824 bytes 2: my offset = 3221225472 3: read 1073741824 bytes 3: my offset = 4294967296 4: read 1073741824 bytes 4: my offset = 5368709120Observe that process 0 started at offset 0 and progressed to offset 1073741824 having read exactly 1073741824 bytes. Process 1 started at offset 1073741824 and progressed to offset 2147483648, which is exactly where process 2 started from. In short, we have read every byte from the file, not missing anything, not even the last couple of bytes, in case the length of the file does not divide by the number of processes. The last process is going to mop them up.
Now we check what the bandwidth was the same way we did it for
mkrandpfile:
io_time = finish - start;
MPI_Allreduce(&io_time, &longest_io_time, 1, MPI_DOUBLE, MPI_MAX,
MPI_COMM_WORLD);
if (i_am_the_master) {
printf("longest_io_time = %f seconds\n", longest_io_time);
printf("total_number_of_bytes = %lld\n", total_number_of_bytes);
printf("transfer rate = %f MB/s\n",
total_number_of_bytes / longest_io_time / MBYTE);
}
And this is it. The processes all go to MPI_Finalize and
exit.