next up previous index
Next: What Else in HDF5? Up: HDF5 Previous: Compound Datatypes

Iterating over HDF5 Groups

You are obviously familiar with the -R switch to the ls command and with the -r switch to the h5ls command, which, in both cases, invoke recursive traversal of the whole directory tree. There is a function in HDF5 called  H5Giterate, which can be used to iterate over the whole group and all its subgroups, picking up on every object within the group and applying a specified function, presumably made of other HDF5 functions, to it. Program h5ls when invoked with the -r option, is an example of how to use this function.

In this section we are going to write our own much simpler version of h5ls. The program is called h5_iterate. It creates an HDF5 file and endows it with a certain structure. Then it proceeds to iterate over the members of the "/" group and tells us what the members it finds are.

Here is how this program works on our AVIDD cluster.

gustav@bh1 $ pwd
/N/B/gustav/src/I590/HDF5/iterate
gustav@bh1 $ h5cc -o h5_iterate h5_iterate.c
gustav@bh1 $ ./h5_iterate
 Objects in the root group are:

 Object with name Dataset1 is a dataset 
 Object with name Datatype1 is a named datatype 
 Object with name Group1 is a group 
gustav@bh1 $ ls
h5_iterate  h5_iterate.c  h5_iterate.o  iterate.h5
gustav@bh1 $
We can have a closer look at what's inside the file created by the program, iterate.h5, with h5dump:
gustav@bh1 $ h5dump iterate.h5
HDF5 "iterate.h5" {
GROUP "/" {
   DATASET "Dataset1" {
      DATATYPE  H5T_STD_U32LE
      DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
      DATA {
         0, 0, 0, 0
      }
   }
   DATATYPE "Datatype1" H5T_COMPOUND {
      H5T_STD_I32LE "a";
      H5T_STD_I32LE "b";
      H5T_IEEE_F32LE "c";
   }
   GROUP "Group1" {
   }
}
}
gustav@bh1 $
The iterator, H5Giterate, will not, by itself, go into subgroups recursively. For this to happen you will have to write a recursive procedure yourself, but this recursive procedure can be based on H5Giterate.

Here is the program, which comes from the NCSA HDF5 Tutorial:

#include "hdf5.h"

#define FILE    "iterate.h5"
#define FALSE   0

/* 1-D dataset with fixed dimensions */
#define SPACE1_NAME  "Space1"
#define SPACE1_RANK	1
#define SPACE1_DIM1	4

herr_t file_info(hid_t loc_id, const char *name, void *opdata);
                                     /* Operator function */
int 
main(void) {
    hid_t		file;		/* HDF5 File IDs		*/
    hid_t		dataset;	/* Dataset ID			*/
    hid_t		group;      /* Group ID             */
    hid_t		sid;       /* Dataspace ID			*/
    hid_t		tid;       /* Datatype ID			*/
    hsize_t		dims[] = {SPACE1_DIM1};
    herr_t		ret;		/* Generic return value		*/

/* Compound datatype */
typedef struct s1_t {
    unsigned int a;
    unsigned int b;
    float c;
} s1_t;

    /* Create file */
    file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

    /* Create dataspace for datasets */
    sid = H5Screate_simple(SPACE1_RANK, dims, NULL);

    /* Create a group */
    group=H5Gcreate(file,"Group1",-1);

    /* Close a group */
    ret = H5Gclose(group);

    /* Create a dataset  */
    dataset=H5Dcreate(file,"Dataset1",H5T_STD_U32LE,sid,H5P_DEFAULT);

    /* Close Dataset */
    ret = H5Dclose(dataset);

    /* Create a datatype */
    tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));

    /* Insert fields */
    ret=H5Tinsert (tid, "a", HOFFSET(s1_t,a), H5T_NATIVE_INT);

    ret=H5Tinsert (tid, "b", HOFFSET(s1_t,b), H5T_NATIVE_INT);

    ret=H5Tinsert (tid, "c", HOFFSET(s1_t,c), H5T_NATIVE_FLOAT);

    /* Save datatype for later */
    ret=H5Tcommit (file, "Datatype1", tid);

    /* Close datatype */
    ret = H5Tclose(tid);

    /* Iterate through the file to see members of the root group */

    printf(" Objects in the root group are:\n");
    printf("\n");

    H5Giterate(file, "/", NULL, file_info, NULL);

    /* Close file */
    ret = H5Fclose(file);

    return 0;
}

/*
 * Operator function.
 */
herr_t file_info(hid_t loc_id, const char *name, void *opdata)
{
    H5G_stat_t statbuf;

    /*
     * Get type of the object and display its name and type.
     * The name of the object is passed to this function by 
     * the Library. Some magic :-)
     */
    H5Gget_objinfo(loc_id, name, FALSE, &statbuf);
    switch (statbuf.type) {
    case H5G_GROUP: 
         printf(" Object with name %s is a group \n", name);
         break;
    case H5G_DATASET: 
         printf(" Object with name %s is a dataset \n", name);
         break;
    case H5G_TYPE: 
         printf(" Object with name %s is a named datatype \n", name);
         break;
    default:
         printf(" Unable to identify an object ");
    }
    return 0;
 }

The program begins by creating an HDF file, then we create a simple dataspace, then a group called "Group1". And once the group is created we close it. We are not going to do anything with it. Then we create a dataset using the simple dataspace and close it too. Finally we create a compound type that corresponds to the following C-language structure:

typedef struct s1_t {
    unsigned int a;
    unsigned int b;
    float c;
} s1_t;
and commit it to the file by calling function  H5Tcommit:
ret = H5Tcommit (file, "Datatype1", tid);
ret = H5Tclose(tid);
This has the effect that the datatype definition is written on the HDF5 file, but not actually used for anything. It can be retrieved from the file later and used, if need be.

Now, having created all these objects, we invoke H5Giterate:

    printf(" Objects in the root group are:\n");
    printf("\n");
    H5Giterate(file, "/", NULL, file_info, NULL);
We are going to iterate over the group "/" of file file. To every object found we are going to apply function file_info, which is defined towards the end of the program source. The last parameter, which is NULL, can be used to pass additional values to function file_info (e.g., a character "v" for "verbose listing"). The third parameter, which is NULL too, is the location at which to begin the iteration. If it is set to NULL, the iteration begins at the beginning of the group, so to speak.

Normally the passed function, here it is file_info, is expected to take three parameters, which will be provided to it by H5Giterate. They will be the object location identifier, the object name, and the third parameter is what we have set to NULL:

herr_t file_info(hid_t loc_id, const char *name, void *opdata)
We have not talked about locations in this course yet. Every HDF5 object has, apart from its name, also its location  or a reference. The objects can be accessed by providing their locations instead of their names. But this is all you need to know for now.

Function file_info calls  H5Gget_objinfo in order to find what the objects passed to it by H5Giterate are:

    H5Gget_objinfo(loc_id, name, FALSE, &statbuf);
The object type is provided in the statbuf structure, whose type member contains the information. The third parameter, which is set to FALSE, tells H5Gget_objinfo that it should not follow links.

Now, depending on what is returned on statbuf.type function file_info writes a corresponding message on standard output and returns.

    switch (statbuf.type) {
    case H5G_GROUP: 
         printf(" Object with name %s is a group \n", name);
         break;
    case H5G_DATASET: 
         printf(" Object with name %s is a dataset \n", name);
         break;
    case H5G_TYPE: 
         printf(" Object with name %s is a named datatype \n", name);
         break;
    default:
         printf(" Unable to identify an object ");
    }
    return 0;

After H5Giterate has returned, we close the file and exit:

    ret = H5Fclose(file);
    return 0;


next up previous index
Next: What Else in HDF5? Up: HDF5 Previous: Compound Datatypes
Zdzislaw Meglicki
2004-04-29