next up previous index
Next: Your HPSS Account Up: Working with Data on Previous: The AVIDD GPFS


The High Performance Storage System , HPSS , is a hierarchical massive data storage system, which can store many PBs of data on tape cartridges mounted inside automated silos, and which can transfer the data at more than one GB/s, if appropriately configured. HPSS is designed specifically to work with very large data objects and to serve clusters with parallel file systems such as GPFS (see section 3.4.1).

HPSS  is used by Trilab , i.e., LANL , LLNL  and Sandia , BAE Systems (they have more than twenty HPSS installations), SDSC  NASA , Oak Ridge , Argonne  (ANL), National Climatic Data Center (NCDC) , National Centers for Environmental Prediction (NCEP) , Brookhaven, JPL, SLAC, three research institutes in Japan, KEK , RIKEN , and ICRR ), one research institute in Korea, KISTI , European Centre for Medium Range Weather Forecast (ECMWF) , French Atomic Energy Commission (CEA)  and Institut National De Physique Nucleaire Et De Physique Des Particules (IN2P3) , The University of Stuttgart  in Germany, Indiana University , of course, and some other large customers. Although the number of HPSS users is not very large, the amount of data these users keep on HPSS is more than 50% of all world's data, sic! HPSS is a very serious system for very serious Men in Black. It is not your average off-the-shelf Legato.

HPSS has been a remarkable success at Indiana University, even though we have not made much use of it in the high performance computing context yet. But HPSS is very flexible and it can be used for a lot of things.

Yet, as with any other system of this type, and there aren't that many, you must always remember that HPSS is a tape storage system when you work with it, even though it presents you with a file system interface when you make a connection to it with ftp, or hsi or pftp. This has some important ramifications.

As GPFS is a truly parallel file system, HPSS is a truly parallel massive data storage system. This is why the two couple so well.

HPSS files can be striped over devices connected to multiple HPSS servers. It is possible to establish data transfer configuration between HPSS and GPFS in such a way that the file is moved in parallel between HPSS servers and GPFS servers. This operation is highly scalable, i.e., you can stripe an HPSS file and its GPFS image over more and more servers and the data transfer rate will scale linearly with the number of servers added. But such scalability is costly, since every new server and disk array you add costs at least a few thousand dollars. Still, a few thousand dollars for a GPFS or an HPSS server is very little compared to what such systems used to cost in the past. For example a Convex machine that used to be amongst the best servers for the UniTree massive data storage system used to cost several hundred thousand dollars, it had to be connected to other supercomputers by a HIPPI bus, and data transfer rates would peak at about 40 MB/s. With 16 well tuned and well configured HPSS PC servers and 16 equally well tuned and well configured GPFS servers you should be able to move data at 360 MB/s in each direction. Note that whichever direction you move the data in, you're always slowed down to the write speed on the other side. I have seen inexpensive PC attached IDA disk arrays that supported writes at 20 MB/s. So, $20\times16=360$.

In our case the situation is somewhat unbalanced. We have a somewhat better IO at the HPSS side and a somewhat worse IO at the AVIDD side, and only 4 servers at each side and so we end up with about 40 MB/s on writes to GPFS and 80 MB/s on writes to HPSS.

But first things first.

next up previous index
Next: Your HPSS Account Up: Working with Data on Previous: The AVIDD GPFS
Zdzislaw Meglicki