All our HDF5 examples so far were concerned with writing very elementary data
on simple datasets. And the data were all 32-bit integers to make things simpler,
the only touch of excitement being small-endian versus big-endian.
But what if we have some richly structured data, a mixture of characters, floats,
integers and what not? Following the example of MPI HDF5 provides a rich interface
for creation of new HDF5 data types. All functions in this family have their names
beginning with H5T, where T stands for type.
You cannot perform any computations on these HDF5
datatypes though. Their purpose is similar to the purpose of MPI datatypes. By
defining HDF5 datatypes you tell it how it should extract data from your C-language
structures and how it should write the data back on them.
The following program is going to create a simple one-dimensional dataset, which is going to hold an array of length 10, on an HDF5 file, but each element of this dataset is going to store a 3-element structure, whose C-language definition is:
typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
The program will initialize an array of such structures in memory, then it will
provide appropriate definitions to HDF5, and will transfer the content of the
memory array to the HDF5 dataset.
In the second part of the program we are going to read the data back, in a special way that lets us select certain components of the structure only, and we'll print the results on standard output.
Here is how the program compiles, links and runs on the AVIDD cluster:
gustav@bh1 $ pwd /N/B/gustav/src/I590/HDF5/compound gustav@bh1 $ h5cc -o h5_compound h5_compound.c gustav@bh1 $ ./h5_compound Field c : 1.0000 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 Field a : 0 1 2 3 4 5 6 7 8 9 Field b : 0.0000 1.0000 4.0000 9.0000 16.0000 25.0000 36.0000 49.0000 64.0000 81.0000 gustav@bh1 $ ls SDScompound.h5 h5_compound h5_compound.c h5_compound.o gustav@bh1 $And here for the more inquisitive minds, is the
h5dump of the file itself,
followed by its h5ls listing:
gustav@bh1 $ h5dump SDScompound.h5
HDF5 "SDScompound.h5" {
GROUP "/" {
DATASET "ArrayOfStructures" {
DATATYPE H5T_COMPOUND {
H5T_STD_I32LE "a_name";
H5T_IEEE_F64LE "c_name";
H5T_IEEE_F32LE "b_name";
}
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA {
{
0,
1,
0
},
{
1,
0.5,
1
},
{
2,
0.333333,
4
},
{
3,
0.25,
9
},
{
4,
0.2,
16
},
{
5,
0.166667,
25
},
{
6,
0.142857,
36
},
{
7,
0.125,
49
},
{
8,
0.111111,
64
},
{
9,
0.1,
81
}
}
}
}
}
gustav@bh1 $ h5ls -r -v SDScompound.h5
Opened "SDScompound.h5" with sec2 driver.
/ArrayOfStructures Dataset {10/10}
Location: 0:1:0:976
Links: 1
Modified: 2003-11-30 17:56:34 EST
Storage: 160 logical bytes, 160 allocated bytes, 100.00% utilization
Type: struct {
"a_name" +0 native int
"c_name" +8 native double
"b_name" +4 native float
} 16 bytes
gustav@bh1 $
You can clearly see that information about the structure is provided. The components of
the structure are written on fields named "a_name", "c_name" and "b_name".
These names are important. They are not just decorative labels. You will use these names
in order to read data from selected fields into memory.
The program itself comes from the NCSA HDF5 Tutorial:
/*
* This example shows how to create a compound data type,
* write an array which has the compound data type to the file,
* and read back fields' subsets.
*/
#include "hdf5.h"
#define FILE "SDScompound.h5"
#define DATASETNAME "ArrayOfStructures"
#define LENGTH 10
#define RANK 1
int
main(void)
{
/* First structure and dataset*/
typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
s1_t s1[LENGTH];
hid_t s1_tid; /* File datatype identifier */
/* Second structure (subset of s1_t) and dataset*/
typedef struct s2_t {
double c;
int a;
} s2_t;
s2_t s2[LENGTH];
hid_t s2_tid; /* Memory datatype handle */
/* Third "structure" ( will be used to read float field of s1) */
hid_t s3_tid; /* Memory datatype handle */
float s3[LENGTH];
int i;
hid_t file, dataset, space; /* Handles */
herr_t status;
hsize_t dim[] = {LENGTH}; /* Dataspace dimensions */
/*
* Initialize the data
*/
for (i = 0; i< LENGTH; i++) {
s1[i].a = i;
s1[i].b = i*i;
s1[i].c = 1./(i+1);
}
/*
* Create the data space.
*/
space = H5Screate_simple(RANK, dim, NULL);
/*
* Create the file.
*/
file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
/*
* Create the memory data type.
*/
s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);
/*
* Create the dataset.
*/
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT);
/*
* Wtite data to the dataset;
*/
status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);
/*
* Release resources
*/
H5Tclose(s1_tid);
H5Sclose(space);
H5Dclose(dataset);
H5Fclose(file);
/*
* Open the file and the dataset.
*/
file = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
dataset = H5Dopen(file, DATASETNAME);
/*
* Create a data type for s2
*/
s2_tid = H5Tcreate(H5T_COMPOUND, sizeof(s2_t));
H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE);
H5Tinsert(s2_tid, "a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT);
/*
* Read two fields c and a from s1 dataset. Fields in the file
* are found by their names "c_name" and "a_name".
*/
status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2);
/*
* Display the fields
*/
printf("\n");
printf("Field c : \n");
for( i = 0; i < LENGTH; i++) printf("%.4f ", s2[i].c);
printf("\n");
printf("\n");
printf("Field a : \n");
for( i = 0; i < LENGTH; i++) printf("%d ", s2[i].a);
printf("\n");
/*
* Create a data type for s3.
*/
s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float));
status = H5Tinsert(s3_tid, "b_name", 0, H5T_NATIVE_FLOAT);
/*
* Read field b from s1 dataset. Field in the file is found by its name.
*/
status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);
/*
* Display the field
*/
printf("\n");
printf("Field b : \n");
for( i = 0; i < LENGTH; i++) printf("%.4f ", s3[i]);
printf("\n");
/*
* Release resources
*/
H5Tclose(s2_tid);
H5Tclose(s3_tid);
H5Dclose(dataset);
H5Fclose(file);
return 0;
}
Let us analyze the program.
The program begins by initializing the array of structures:
#define LENGTH 10
...
typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
s1_t s1[LENGTH];
...
for (i = 0; i< LENGTH; i++) {
s1[i].a = i;
s1[i].b = i*i;
s1[i].c = 1./(i+1);
}
Then we create a simple dataspace. Observe that there is no information
provided at this stage about the structure of the entities the dataspace is made of:
#define LENGTH 10
#define RANK 1
...
hsize_t dim[] = {LENGTH};
...
space = H5Screate_simple(RANK, dim, NULL);
We create the HDF5 file:
file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
and now we have to tell HDF5 more about the structure that the dataset is going to
be made of:
s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT);
H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE);
H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);
We have two new functions here. The first one
H5Tcreate
creates the new type. The first parameter in this call is an
HDF5 constant H5T_COMPOUND,
which tells HDF5 that this
is going to be a compound object. The other possible choices here are
H5T_OPAQUE
and H5T_ENUM . The second parameter
specifies the size of the whole object.
The second function is
H5Tinsert
and it is used
to tell HDF5 how exactly the new type, whose identifier is s1_tid, is
made.
The first call specifies that the first component is of type
H5T_NATIVE_INT , it should be inserted
into the field of name "a_name", and it should be picked from the memory location that
corresponds to the offset of the a component in the C-language
structure s1_t. A special HDF5 macro HOFFSET can be
used to get it. You may recall that we had a function in MPI, called
MPI_Get_address , whose purpose was, similarly,
to find offsets of various structure components in order to provide
MPI_Type_struct with displacements.
The second call tells HDF5 that the second component of the structure is
of type H5T_NATIVE_DOUBLE
and its memory location corresponds to the location of the c component of
the s1_t structures. It should be inserted into the field named "c_name".
Finally the third call tells HDF5 that the third component is to be inserted into
the field called
"b_name", that it is an H5T_NATIVE_FLOAT
and that its memory location corresponds to the location of the b component in the
s1_t structure.
Now we are finally ready to create the dataset:
#define DATASETNAME "ArrayOfStructures" ... dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT);The combination of the two parameters,
(s1_tid, space), tells H5Dcreate
what the space should really be like.
Now we can peform the simple HDF5 write and release all resources:
status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);
H5Tclose(s1_tid);
H5Sclose(space);
H5Dclose(dataset);
H5Fclose(file);
In the second part of the program we open the file for reading, open the dataset,
and then read the data. But we read it a little differently than we have written
it originally. This is because in the first read we are going to read the
"c_name" and "a_name" fields of the dataset only. To do this we create
a new type:
#define LENGTH 10
...
typedef struct s2_t {
double c;
int a;
} s2_t;
s2_t s2[LENGTH];
...
s2_tid = H5Tcreate(H5T_COMPOUND, sizeof(s2_t));
H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE);
H5Tinsert(s2_tid, "a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT);
and use it to read the selected fields from the dataset - the data goes then
directly into s2 into slots specified by HOFFSET(s2_t,c) and
HOFFSET(s2_t,a) respectively:
status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2);
Having read the data we print it on standard output:
printf("\n");
printf("Field c : \n");
for( i = 0; i < LENGTH; i++) printf("%.4f ", s2[i].c);
printf("\n");
printf("\n");
printf("Field a : \n");
for( i = 0; i < LENGTH; i++) printf("%d ", s2[i].a);
printf("\n");
Now we create an HDF5 type for reading the "b_name" field:
s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float));
status = H5Tinsert(s3_tid, "b_name", 0, H5T_NATIVE_FLOAT);
The offset in this case is zero, because we are going to read the data
onto a plain array of floats, s3.
Then we read the field and print the array s3 on standard output:
#define LENGTH 10
...
float s3[LENGTH];
...
status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);
printf("\n");
printf("Field b : \n");
for( i = 0; i < LENGTH; i++) printf("%.4f ", s3[i]);
printf("\n");
Having done all this, we close the shop and exit.
H5Tclose(s2_tid);
H5Tclose(s3_tid);
H5Dclose(dataset);
H5Fclose(file);
return 0;