The High Performance Storage System , HPSS , is a hierarchical massive data storage system, which can store many PBs of data on tape cartridges mounted inside automated silos, and which can transfer the data at more than one GB/s, if appropriately configured. HPSS is designed specifically to work with very large data objects and to serve clusters with parallel file systems such as GPFS (see section 3.4.1).
HPSS is used by Trilab , i.e., LANL , LLNL and Sandia , BAE Systems (they have more than twenty HPSS installations), SDSC NASA , Oak Ridge , Argonne (ANL), National Climatic Data Center (NCDC) , National Centers for Environmental Prediction (NCEP) , Brookhaven, JPL, SLAC, three research institutes in Japan, KEK , RIKEN , and ICRR ), one research institute in Korea, KISTI , European Centre for Medium Range Weather Forecast (ECMWF) , French Atomic Energy Commission (CEA) and Institut National De Physique Nucleaire Et De Physique Des Particules (IN2P3) , The University of Stuttgart in Germany, Indiana University , of course, and some other large customers. Although the number of HPSS users is not very large, the amount of data these users keep on HPSS is more than 50% of all world's data, sic! HPSS is a very serious system for very serious Men in Black. It is not your average off-the-shelf Legato.
HPSS has been a remarkable success at Indiana University, even though we have not made much use of it in the high performance computing context yet. But HPSS is very flexible and it can be used for a lot of things.
Yet, as with any other system of this type, and there aren't that many, you must always remember that HPSS is a tape storage system when you work with it, even though it presents you with a file system interface when you make a connection to it with ftp, or hsi or pftp. This has some important ramifications.
As GPFS is a truly parallel file system, HPSS is a truly parallel massive data storage system. This is why the two couple so well.
HPSS files can be striped over devices connected to multiple HPSS
servers. It is possible to establish data transfer configuration
between HPSS and GPFS in such a way that the file is moved in parallel
between HPSS servers and GPFS servers. This operation is highly
scalable, i.e., you can stripe an HPSS file and its GPFS image over more
and more servers and the data transfer rate will scale linearly
with the number of servers added. But such scalability is costly, since
every new server and disk array you add costs at least a few thousand dollars.
Still, a few thousand dollars for a GPFS or an HPSS server is very little
compared to what such systems used to cost in the past. For example a Convex
machine that used to be amongst the best servers for the UniTree massive
data storage system used to cost several hundred thousand dollars, it had
to be connected to other supercomputers by a HIPPI bus, and data transfer
rates would peak at about 40 MB/s. With 16 well tuned and well configured
HPSS PC servers and 16 equally well tuned and well configured GPFS servers
you should be able to move data at 360 MB/s in each direction. Note that
whichever direction you move the data in, you're always slowed down to the
write speed on the other side. I have seen inexpensive PC
attached IDA disk arrays that supported writes at 20 MB/s.
So,
.
In our case the situation is somewhat unbalanced. We have a somewhat better IO at the HPSS side and a somewhat worse IO at the AVIDD side, and only 4 servers at each side and so we end up with about 40 MB/s on writes to GPFS and 80 MB/s on writes to HPSS.
But first things first.