next up previous index
Next: Striding Data Up: Partitioning MPI-IO/HDF5 Datasets Previous: Partitioning MPI-IO/HDF5 Datasets

Contiguous Hyperslabs

The following example code, also taken from the NCSA HDF5 Tutorial introduces almost nothing new, but when all that you've learnt so far gets combined together, you end up, miraculously, with a parallel write onto an HDF5 dataset that sits in an HDF5 file.

This time before I show you the program, I want to show you what it does first.

The program lives in this directory

gustav@bh1 $ pwd
/N/B/gustav/src/I590/HDF5/mpi-io
gustav@bh1 $ ls
hyperslab_by_row.c
gustav@bh1
and is compiled and linked with the h5cc wrapper, which knows all about MPI and MPI-IO too.
gustav@bh1 $ h5cc -o hyperslab_by_row hyperslab_by_row.c
gustav@bh1 $
I install the program in my ~/bin so that mpiexec will find it. Then I go my GPFS directory.
gustav@bh1 $ mv hyperslab_by_row ~/bin
gustav@bh1 $ cd /N/gpfs/gustav
gustav@bh1 $ ls
t_mpio_1wMr  test  tests
gustav@bh1 $
MPD is running already on 26 nodes, but I'm going to use 4 only for this run:
gustav@bh1 $ mpdtrace | wc
     26      26     129
gustav@bh1 $ mpiexec -n 4 hyperslab_by_row
gustav@bh1 $
The job created an HDF5 file SDS_row.h5 in this directory, and I am going to inspect it with h5dump.
gustav@bh1 $ ls
SDS_row.h5  t_mpio_1wMr  test  tests
gustav@bh1 $ h5dump SDS_row.h5
HDF5 "SDS_row.h5" {
GROUP "/" {
   DATASET "IntArray" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SIMPLE { ( 8, 5 ) / ( 8, 5 ) }
      DATA {
         10, 10, 10, 10, 10,
         10, 10, 10, 10, 10,
         11, 11, 11, 11, 11,
         11, 11, 11, 11, 11,
         12, 12, 12, 12, 12,
         12, 12, 12, 12, 12,
         13, 13, 13, 13, 13,
         13, 13, 13, 13, 13
      }
   }
}
}
gustav@bh1 $
You will see when we get to analyze the program that process 0 wrote:
         10, 10, 10, 10, 10,
         10, 10, 10, 10, 10,
process 1 wrote
         11, 11, 11, 11, 11,
         11, 11, 11, 11, 11,
process 2 wrote
         12, 12, 12, 12, 12,
         12, 12, 12, 12, 12,
and process 3 wrote:
         13, 13, 13, 13, 13,
         13, 13, 13, 13, 13
The writes were collective, i.e., HDF5 used MPI_File_write_all to write the data on this file, and presumably, since the file lives on GPFS, the writes occurred in parallel.

So now let's see how this is done. Here is the program:

/*  
 *  This example writes data to the HDF5 file by rows.
 *  Number of processes is assumed to be 1 or multiples of 2 (up to 8)
 */
 
#include "hdf5.h"

#define H5FILE_NAME     "SDS_row.h5"
#define DATASETNAME 	"IntArray" 
#define NX     8                      /* dataset dimensions */
#define NY     5 
#define RANK   2

int
main (int argc, char **argv)
{
    /*
     * HDF5 APIs definitions
     */ 	
    hid_t       file_id, dset_id;         /* file and dataset identifiers */
    hid_t       filespace, memspace;      /* file and memory dataspace identifiers */
    hsize_t     dimsf[2];                 /* dataset dimensions */
    int         *data;                    /* pointer to data buffer to write */
    hsize_t	count[2];	          /* hyperslab selection parameters */
    hssize_t	offset[2];
    hid_t	plist_id;                 /* property list identifier */
    int         i;
    herr_t	status;

    /*
     * MPI variables
     */
    int mpi_size, mpi_rank;
    MPI_Comm comm  = MPI_COMM_WORLD;
    MPI_Info info  = MPI_INFO_NULL;

    /*
     * Initialize MPI
     */
    MPI_Init(&argc, &argv);
    MPI_Comm_size(comm, &mpi_size);
    MPI_Comm_rank(comm, &mpi_rank);  
 
    /* 
     * Set up file access property list with parallel I/O access
     */
     plist_id = H5Pcreate(H5P_FILE_ACCESS);
     H5Pset_fapl_mpio(plist_id, comm, info);

    /*
     * Create a new file collectively and release property list identifier.
     */
    file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
    H5Pclose(plist_id);
   

    /*
     * Create the dataspace for the dataset.
     */
    dimsf[0] = NX;
    dimsf[1] = NY;
    filespace = H5Screate_simple(RANK, dimsf, NULL); 

    /*
     * Create the dataset with default properties and close filespace.
     */
    dset_id = H5Dcreate(file_id, DATASETNAME, H5T_NATIVE_INT, filespace,
			H5P_DEFAULT);
    H5Sclose(filespace);

    /* 
     * Each process defines dataset in memory and writes it to the hyperslab
     * in the file.
     */
    count[0] = dimsf[0]/mpi_size;
    count[1] = dimsf[1];
    offset[0] = mpi_rank * count[0];
    offset[1] = 0;
    memspace = H5Screate_simple(RANK, count, NULL);

    /*
     * Select hyperslab in the file.
     */
    filespace = H5Dget_space(dset_id);
    H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, count, NULL);

    /*
     * Initialize data buffer 
     */
    data = (int *) malloc(sizeof(int)*count[0]*count[1]);
    for (i=0; i < count[0]*count[1]; i++) {
        data[i] = mpi_rank + 10;
    }

    /*
     * Create property list for collective dataset write.
     */
    plist_id = H5Pcreate(H5P_DATASET_XFER);
    H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
    
    status = H5Dwrite(dset_id, H5T_NATIVE_INT, memspace, filespace,
		      plist_id, data);
    free(data);

    /*
     * Close/release resources.
     */
    H5Dclose(dset_id);
    H5Sclose(filespace);
    H5Sclose(memspace);
    H5Pclose(plist_id);
    H5Fclose(file_id);
 
    MPI_Finalize();

    return 0;
}
The program begins with already traditional MPI incantations:
    MPI_Init(&argc, &argv);
    MPI_Comm_size(comm, &mpi_size);
    MPI_Comm_rank(comm, &mpi_rank);
Then we immediately switch to HDF5. We create a file access property list that contains MPI information and we create the file itself:
    plist_id = H5Pcreate(H5P_FILE_ACCESS);
    H5Pset_fapl_mpio(plist_id, comm, info);
    file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
    H5Pclose(plist_id);
Now we create the dataspace and the dataset:
#define H5FILE_NAME     "SDS_row.h5"
#define DATASETNAME 	"IntArray" 
#define NX     8  
#define NY     5 
#define RANK   2
...
    dimsf[0] = NX;
    dimsf[1] = NY;
    filespace = H5Screate_simple(RANK, dimsf, NULL); 
    dset_id = H5Dcreate(file_id, DATASETNAME, H5T_NATIVE_INT, filespace,
			H5P_DEFAULT);
    H5Sclose(filespace);
The dataset is going to have 8 rows and 5 columns.

Now each process defines its own memory dataset and memory dataspace for its own data and also selects a hyperslab on the HDF5 dataset in the file, on which it is going to write:

    count[0] = dimsf[0]/mpi_size;
    count[1] = dimsf[1];
    offset[0] = mpi_rank * count[0];
    offset[1] = 0;
    memspace = H5Screate_simple(RANK, count, NULL);
    filespace = H5Dget_space(dset_id);
    H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, count, NULL);

Now each process initializes its own data:

    data = (int *) malloc(sizeof(int)*count[0]*count[1]);
    for (i=0; i < count[0]*count[1]; i++) {
        data[i] = mpi_rank + 10;
    }
and then we get back to HDF5. Each process creates a data transfer property list, in which we request that the following write should be collective:
    plist_id = H5Pcreate(H5P_DATASET_XFER);
    H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
Function  H5Pset_dxpl_mpio is the only new function in this program. Apart from H5FD_MPIO_COLLECTIVE  you can also request H5FD_MPIO_INDEPENDENT , which is a default actually.

Now we are ready to write the data:

    status = H5Dwrite(dset_id, H5T_NATIVE_INT, memspace, filespace,
		      plist_id, data);
Observe that filespace corresponds to a different hyperslab for each process, so that they don't step on each other's toes.

After the write returns, we free the array data, which was malloced before, then close the dataset, the filespace, the memory space, the property list and finally the file itself.

And, at the end, we call MPI_Finalize, because it is an MPI program after all.

So this is how it works. HDF5 implements parallel IO on the dataset level and file partitioning is accomplished by the means of hyperslab selection.


next up previous index
Next: Striding Data Up: Partitioning MPI-IO/HDF5 Datasets Previous: Partitioning MPI-IO/HDF5 Datasets
Zdzislaw Meglicki
2004-04-29