next up previous index
Next: Iterating over HDF5 Groups Up: HDF5 Previous: Point Selection

Compound Datatypes

All our HDF5 examples so far were concerned with writing very elementary data on simple datasets. And the data were all 32-bit integers to make things simpler, the only touch of excitement being small-endian versus big-endian. But what if we have some richly structured data, a mixture of characters, floats, integers and what not? Following the example of MPI HDF5 provides a rich interface for creation of new HDF5 data types. All functions in this family have their names beginning with H5T, where T stands for type. You cannot perform any computations on these HDF5 datatypes though. Their purpose is similar to the purpose of MPI datatypes. By defining HDF5 datatypes you tell it how it should extract data from your C-language structures and how it should write the data back on them.

The following program is going to create a simple one-dimensional dataset, which is going to hold an array of length 10, on an HDF5 file, but each element of this dataset is going to store a 3-element structure, whose C-language definition is:

    typedef struct s1_t {
	int    a;
	float  b;
	double c; 
    } s1_t;
The program will initialize an array of such structures in memory, then it will provide appropriate definitions to HDF5, and will transfer the content of the memory array to the HDF5 dataset.

In the second part of the program we are going to read the data back, in a special way that lets us select certain components of the structure only, and we'll print the results on standard output.

Here is how the program compiles, links and runs on the AVIDD cluster:

gustav@bh1 $ pwd
/N/B/gustav/src/I590/HDF5/compound
gustav@bh1 $ h5cc -o h5_compound h5_compound.c
gustav@bh1 $ ./h5_compound

Field c : 
1.0000 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 

Field a : 
0 1 2 3 4 5 6 7 8 9 

Field b : 
0.0000 1.0000 4.0000 9.0000 16.0000 25.0000 36.0000 49.0000 64.0000 81.0000 
gustav@bh1 $ ls
SDScompound.h5  h5_compound  h5_compound.c  h5_compound.o
gustav@bh1 $
And here for the more inquisitive minds, is the h5dump of the file itself, followed by its h5ls listing:
gustav@bh1 $ h5dump SDScompound.h5
HDF5 "SDScompound.h5" {
GROUP "/" {
   DATASET "ArrayOfStructures" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_I32LE "a_name";
         H5T_IEEE_F64LE "c_name";
         H5T_IEEE_F32LE "b_name";
      }
      DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }
      DATA {
         {
            0,
            1,
            0
         },
         {
            1,
            0.5,
            1
         },
         {
            2,
            0.333333,
            4
         },
         {
            3,
            0.25,
            9
         },
         {
            4,
            0.2,
            16
         },
         {
            5,
            0.166667,
            25
         },
         {
            6,
            0.142857,
            36
         },
         {
            7,
            0.125,
            49
         },
         {
            8,
            0.111111,
            64
         },
         {
            9,
            0.1,
            81
         }
      }
   }
}
}
gustav@bh1 $ h5ls -r -v SDScompound.h5
Opened "SDScompound.h5" with sec2 driver.
/ArrayOfStructures       Dataset {10/10}
    Location:  0:1:0:976
    Links:     1
    Modified:  2003-11-30 17:56:34 EST
    Storage:   160 logical bytes, 160 allocated bytes, 100.00% utilization
    Type:      struct {
                   "a_name"           +0    native int
                   "c_name"           +8    native double
                   "b_name"           +4    native float
               } 16 bytes
gustav@bh1 $
You can clearly see that information about the structure is provided. The components of the structure are written on fields named "a_name", "c_name" and "b_name". These names are important. They are not just decorative labels. You will use these names in order to read data from selected fields into memory.

The program itself comes from the NCSA HDF5 Tutorial:

/*
 * This example shows how to create a compound data type,
 * write an array which has the compound data type to the file,
 * and read back fields' subsets.
 */

#include "hdf5.h"

#define FILE          "SDScompound.h5"
#define DATASETNAME   "ArrayOfStructures"
#define LENGTH        10
#define RANK          1

int
main(void)
{

    /* First structure  and dataset*/
    typedef struct s1_t {
	int    a;
	float  b;
	double c; 
    } s1_t;
    s1_t       s1[LENGTH];
    hid_t      s1_tid;     /* File datatype identifier */

    /* Second structure (subset of s1_t)  and dataset*/
    typedef struct s2_t {
	double c;
	int    a;
    } s2_t;
    s2_t       s2[LENGTH];
    hid_t      s2_tid;    /* Memory datatype handle */

    /* Third "structure" ( will be used to read float field of s1) */
    hid_t      s3_tid;   /* Memory datatype handle */
    float      s3[LENGTH];

    int        i;
    hid_t      file, dataset, space; /* Handles */
    herr_t     status;
    hsize_t    dim[] = {LENGTH};   /* Dataspace dimensions */


    /*
     * Initialize the data
     */
    for (i = 0; i< LENGTH; i++) {
        s1[i].a = i;
        s1[i].b = i*i;
        s1[i].c = 1./(i+1);
    }

    /*
     * Create the data space.
     */
    space = H5Screate_simple(RANK, dim, NULL);

    /*
     * Create the file.
     */
    file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

    /*
     * Create the memory data type. 
     */
    s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
    H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
    H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
    H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);

    /* 
     * Create the dataset.
     */
    dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT);

    /*
     * Wtite data to the dataset; 
     */
    status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);

    /*
     * Release resources
     */
    H5Tclose(s1_tid);
    H5Sclose(space);
    H5Dclose(dataset);
    H5Fclose(file);
 
    /*
     * Open the file and the dataset.
     */
    file = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
 
    dataset = H5Dopen(file, DATASETNAME);

    /* 
     * Create a data type for s2
     */
    s2_tid = H5Tcreate(H5T_COMPOUND, sizeof(s2_t));

    H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE);
    H5Tinsert(s2_tid, "a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT);

    /*
     * Read two fields c and a from s1 dataset. Fields in the file
     * are found by their names "c_name" and "a_name".
     */
    status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2);

    /*
     * Display the fields
     */
    printf("\n");
    printf("Field c : \n");
    for( i = 0; i < LENGTH; i++) printf("%.4f ", s2[i].c);
    printf("\n");

    printf("\n");
    printf("Field a : \n");
    for( i = 0; i < LENGTH; i++) printf("%d ", s2[i].a);
    printf("\n");

    /* 
     * Create a data type for s3.
     */
    s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float));

    status = H5Tinsert(s3_tid, "b_name", 0, H5T_NATIVE_FLOAT);

    /*
     * Read field b from s1 dataset. Field in the file is found by its name.
     */
    status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);

    /*
     * Display the field
     */
    printf("\n");
    printf("Field b : \n");
    for( i = 0; i < LENGTH; i++) printf("%.4f ", s3[i]);
    printf("\n");

    /*
     * Release resources
     */
    H5Tclose(s2_tid);
    H5Tclose(s3_tid);
    H5Dclose(dataset);
    H5Fclose(file);

    return 0;
}
Let us analyze the program.

The program begins by initializing the array of structures:

#define LENGTH        10
...
    typedef struct s1_t {
	int    a;
	float  b;
	double c; 
    } s1_t;
    s1_t       s1[LENGTH];
...
    for (i = 0; i< LENGTH; i++) {
        s1[i].a = i;
        s1[i].b = i*i;
        s1[i].c = 1./(i+1);
    }
Then we create a simple dataspace. Observe that there is no information provided at this stage about the structure of the entities the dataspace is made of:
#define LENGTH        10
#define RANK          1
...
    hsize_t    dim[] = {LENGTH};   
...
   space = H5Screate_simple(RANK, dim, NULL);
We create the HDF5 file:
    file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
and now we have to tell HDF5 more about the structure that the dataset is going to be made of:
    s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
    H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
    H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
    H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);
We have two new functions here. The first one  H5Tcreate creates the new type. The first parameter in this call is an HDF5 constant  H5T_COMPOUND, which tells HDF5 that this is going to be a compound object. The other possible choices here are H5T_OPAQUE  and H5T_ENUM . The second parameter specifies the size of the whole object.

The second function is  H5Tinsert and it is used to tell HDF5 how exactly the new type, whose identifier is s1_tid, is made.

The first call specifies that the first component is of type H5T_NATIVE_INT , it should be inserted into the field of name "a_name", and it should be picked from the memory location that corresponds to the offset of the a component in the C-language structure s1_t. A special HDF5 macro HOFFSET  can be used to get it. You may recall that we had a function in MPI, called MPI_Get_address , whose purpose was, similarly, to find offsets of various structure components in order to provide MPI_Type_struct  with displacements.

The second call tells HDF5 that the second component of the structure is of type H5T_NATIVE_DOUBLE  and its memory location corresponds to the location of the c component of the s1_t structures. It should be inserted into the field named "c_name".

Finally the third call tells HDF5 that the third component is to be inserted into the field called "b_name", that it is an H5T_NATIVE_FLOAT  and that its memory location corresponds to the location of the b component in the s1_t structure.

Now we are finally ready to create the dataset:

#define DATASETNAME   "ArrayOfStructures"
...
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT);
The combination of the two parameters, (s1_tid, space), tells H5Dcreate what the space should really be like.

Now we can peform the simple HDF5 write and release all resources:

    status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);
    H5Tclose(s1_tid);
    H5Sclose(space);
    H5Dclose(dataset);
    H5Fclose(file);

In the second part of the program we open the file for reading, open the dataset, and then read the data. But we read it a little differently than we have written it originally. This is because in the first read we are going to read the "c_name" and "a_name" fields of the dataset only. To do this we create a new type:

#define LENGTH        10
...
    typedef struct s2_t {
	double c;
	int    a;
    } s2_t;
    s2_t       s2[LENGTH];
...
    s2_tid = H5Tcreate(H5T_COMPOUND, sizeof(s2_t));
    H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE);
    H5Tinsert(s2_tid, "a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT);
and use it to read the selected fields from the dataset - the data goes then directly into s2 into slots specified by HOFFSET(s2_t,c) and HOFFSET(s2_t,a) respectively:
    status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2);
Having read the data we print it on standard output:
    printf("\n");
    printf("Field c : \n");
    for( i = 0; i < LENGTH; i++) printf("%.4f ", s2[i].c);
    printf("\n");

    printf("\n");
    printf("Field a : \n");
    for( i = 0; i < LENGTH; i++) printf("%d ", s2[i].a);
    printf("\n");
Now we create an HDF5 type for reading the "b_name" field:
    s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float));
    status = H5Tinsert(s3_tid, "b_name", 0, H5T_NATIVE_FLOAT);
The offset in this case is zero, because we are going to read the data onto a plain array of floats, s3. Then we read the field and print the array s3 on standard output:
#define LENGTH        10
...
    float      s3[LENGTH];
...
    status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);
    printf("\n");
    printf("Field b : \n");
    for( i = 0; i < LENGTH; i++) printf("%.4f ", s3[i]);
    printf("\n");

Having done all this, we close the shop and exit.

    H5Tclose(s2_tid);
    H5Tclose(s3_tid);
    H5Dclose(dataset);
    H5Fclose(file);
    return 0;


next up previous index
Next: Iterating over HDF5 Groups Up: HDF5 Previous: Point Selection
Zdzislaw Meglicki
2004-04-29