Default writes on HDF5 datasets are neither compressed nor is error checking activated for them either. Both can be turned on by the means of dataset creation property lists. In this section I am going to show you how to activate compression.
The following program, taken from the
NCSA HDF5 Tutorial,
does the following. First it creates a standard HDF5 data file
called zip.h5. A group /Data is then created
in the file. Then we get down to generate a property list for the
dataset creation. The list is going to activate two features:
chunking and compression. Then we create the dataset
/Data/Compressed_Data using the list. The data itself is
generated, then written on the dataset. At this stage we close
the dataspace, the dataset, the group and the file.
Then we re-open the file, the group and the dataset. The data is read in full. There is no need for any hocus pocus with property lists here, because the required property is already attached to the dataset on the file and HDF5 learns about it when it opens the dataset. The decompression is activated automatically when the data is read. Having read the data we print a small portion of it on standard output, then close the dataset, the group and the file.
Here's the program:
/* Create compressed dataset */
#include "hdf5.h"
#define FILE "zip.h5"
/* Uncomment to remove compression and
comment out line above
#define FILE "unzip.h5"
*/
#define RANK 2
int
main(void)
{
hid_t file, grp;
hid_t dataset, dataspace;
hid_t plist;
herr_t status;
hsize_t dims[2];
hsize_t cdims[2];
int idx;
int i,j;
int buf[1000][20];
int rbuf [1000][20];
/*
* Create a file.
*/
file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
printf ("H5Fcreate returns: %d\n", file);
/*
* Create a group in the file.
*/
grp = H5Gcreate(file, "/Data", 0);
printf ("H5Gcreate returns: %d\n", grp);
/*
* Create dataset "Compressed Data" in the group using absolute
* name. Dataset creation property list is modified to use
* GZIP compression with the compression effort set to 6.
* Note that compression can be used only when dataset is chunked.
*/
dims[0] = 1000;
dims[1] = 20;
cdims[0] = 20;
cdims[1] = 20;
dataspace = H5Screate_simple(RANK, dims, NULL);
printf ("H5Screate_simple: %d\n", dataspace);
/* Uncomment this section if you want to use GZIP compression
Be sure to comment out the line following, as well.
*/
plist = H5Pcreate(H5P_DATASET_CREATE);
printf ("H5Pcreate returns: %d\n", plist);
status = H5Pset_chunk(plist, 2, cdims);
printf ("H5Pset_chunk returns: %d\n", status);
status = H5Pset_deflate( plist, 6);
printf ("H5Pset_deflate returns: %d\n", status);
dataset = H5Dcreate(file, "/Data/Compressed_Data", H5T_STD_I32BE,
dataspace, plist);
/*
dataset = H5Dcreate(file, "/Data/Uncompressed_Data", H5T_STD_I32BE,
dataspace, H5P_DEFAULT);
*/
printf ("H5Dcreate returns: %d\n", dataset);
for (i = 0; i< dims[0]; i++) {
for (j=0; j<dims[1]; j++) {
buf[i][j] = i+j;
}
}
status = H5Dwrite(dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf);
printf ("H5Dwrite: %d\n", status);
status = H5Sclose(dataspace);
printf ("H5Sclose: %d\n", status);
status = H5Dclose(dataset);
printf ("H5Dclose: %d\n", status);
status = H5Gclose (grp);
printf ("H5Gclose: %d\n", status);
status = H5Fclose(file);
printf ("H5Fclose: %d\n", status);
/*
* Now reopen the file and group in the file.
*/
file = H5Fopen(FILE, H5F_ACC_RDWR, H5P_DEFAULT);
printf ("H5Fopen: %d\n", file);
grp = H5Gopen(file, "Data");
printf ("H5Gopen: %d\n", grp);
dataset = H5Dopen(grp, "Compressed_Data");
/* Uncomment, if removing compression
and comment out line above
dataset = H5Dopen(grp, "Uncompressed_Data");
*/
printf ("H5Dopen: %d\n", dataset);
status = H5Dread (dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
H5P_DEFAULT, rbuf);
printf ("\nData (10 lines):\n");
for (i=0; i<10; i++)
{
for (j=0; j<20; j++)
printf(" %d", rbuf[i][j]);
printf ("\n");
}
status = H5Dclose(dataset);
printf ("\nH5Dclose: %d\n", status);
status = H5Gclose (grp);
printf ("H5Gclose: %d\n", status);
status = H5Fclose(file);
printf ("H5Fclose: %d\n", status);
}
The program is compiled and linked with h5cc and run
normally by invoking its name:gustav@bh1 $ h5cc -o h5_zip h5_zip.c gustav@bh1 $ ./h5_zip H5Fcreate returns: 67108864 H5Gcreate returns: 201326592 H5Screate_simple: 335544322 H5Pcreate returns: 805306377 H5Pset_chunk returns: 0 H5Pset_deflate returns: 0 H5Dcreate returns: 402653184 H5Dwrite: 0 H5Sclose: 0 H5Dclose: 0 H5Gclose: 0 H5Fclose: 0 H5Fopen: 67108865 H5Gopen: 201326593 H5Dopen: 402653185 Data (10 lines): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 H5Dclose: 0 H5Gclose: 0 H5Fclose: 0 gustav@bh1 $As the program runs it prints returns of its various internal function calls on standard output. The small portion of data printed at the end shows that the data read back from the compressed dataset is indeed correct. The data array itself is
But has the data been compressed? There are
4-byte long integers in the dataset, which translates into 80,000 bytes.
But the file is only 11,312 bytes long:
gustav@bh1 $ ls -l zip.h5 -rw-r--r-- 1 gustav ucs 11312 Nov 24 12:56 zip.h5 gustav@bh1 $so the data in it indeed must have been compressed. You can run
h5dump on this file and you'll get all the data back
uncompressed. But you won't find any hint that the data in
the file is compressed either. To see this look at the file
with h5ls:
gustav@bh1 $ h5ls -r -v zip.h5
Opened "zip.h5" with sec2 driver.
/Data Group
Location: 0:1:0:1576
Links: 1
/Data/Compressed_Data Dataset {1000/1000, 20/20}
Location: 0:1:0:1952
Links: 1
Modified: 2003-11-24 12:56:24 EST
Chunks: {20, 20} 1600 bytes
Storage: 80000 logical bytes, 5316 allocated bytes, 1504.89% utilization
Filter-0: deflate-1 OPT {6}
Type: 32-bit big-endian integer
gustav@bh1 $
Here you can see that the 80,000 logical bytes have been squeezed into
5,316 physical bytes and that a deflate-1 OPT {6} filter,
as we have requested with the call
to
H5Pset_deflate:
status = H5Pset_deflate( plist, 6);
has been used.
There are no new elements in this program other than the call
to H5Pset_deflate, so I won't discuss the program in
detail. It should be easy for you to see, by now, how the program
goes about its business. Function H5Pset_deflate takes
a property list as its first argument. The function activates
the GNU gzip algorithm on the data. If you look at the
gzip man page, you'll see that you can regulate the
compression speed by calling gzip with a flag such as
-1 or -9. -1, which is equivalent to
--fast, results in very fast but not very effective
compression. On the other hand -9, which is equivalent to
--best, results in slow but very effective compression.
You can do the same when you call H5Pset_deflate. The
second argument is the compression speed argument from gzip.
It can be any integer between 1 and 9.