next up previous index
Next: IU HPC Facilities Up: Supercomputers and Clusters Previous: Supercomputers and Clusters

HPC Facilities at Public Supercomputer Centers

Another result of decisions made in mid-1990s, apart from the near-death experience of the US supercomputer industry, was that American researchers ended up with very limited access to well engineered supercomputers in their own country.

Perhaps the only public  computing facility that has been providing access to such systems over the years  is the Arctic Region Supercomputer Center (ARSC) in Fairbanks, Alaska. Today the center operates a 272-processor Cray T3E, a 32-processor Cray SV1ex, an 8-processor NEC SX6 (which is re-badged for the US market as Cray SX-6), and, the most recent acquisition, a 128-multi-streaming-processor  Cray X1. The latter is a highly efficient massively parallel vector system, with 1.6 TFLOPS peak performance. This system should deliver between 40% and 60% efficiency on production codes, which would make it equivalent to, say, a 20 TFLOPS IA32 cluster. (They are going to install just such a 20 TFLOPS IA32 cluster at NCSA, see below.) The Cray X1 at ARSC is equipped in about 1 Peta Byte storage in the form of Sun disk array servers and Storagetek tape silos (we have a similar storage system here at Indiana University).

There are also some US Army computer centers that have well engineered machines, but they are largely inaccessible to US researchers on account of being dedicated to military work, so we won't talk about those.

The three National Supercomputer Centers, Pittsburgh, Urbana-Champaign, and San Diego, offer various clusters of scalar CPUs only.

Pittsburgh  has a 512-processor Cray T3E and a cluster comprising 750 4-way Alpha SMPs.

NCSA in  Urbana-Champaign has (or is going to have soon) two IA32 clusters, one with 484 2-way SMP nodes dedicated to computation and 32 2-way SMP nodes dedicated to storage, and another one with 1,450 2-way nodes. The latter is going to yield (when it is finally installed) 17.7 TFLOPS peak, which may translate to perhaps about 1 TFLOPS deliverable on production codes - close enough to the ARSC's Cray X1.

You can think of our AVIDD cluster as being a smaller version of this large NCSA IA32 cluster. Codes developed for AVIDD should run almost without change on the NCSA 17.7 TFLOPS system.

NCSA also has two IA64 clusters, one with 128 2-way nodes dedicated to computation and 4 2-way nodes dedicated to storage, and another one with 256 2-way nodes.

Finally SDSC (San Diego) has  an aging IBM SP (it is a cluster too) with 1,152 POWER3 CPUs. But SDSC is in alliance with the University  of Texas and the University  of Michigan, which contribute some fairly sizeable systems of their own to the pool. For example, the University of Texas contributes an IBM SP with 224 POWER4 CPUs, which is only a little less powerful than the SDSC system, and the University of Michigan contributes four clusters, which altogether add up to almost as much as the SDSC SP.

All in all these facilities are somewhat disappointing. Developing parallel programs for the clusters of scalar CPUs is very tedious, debugging is cumbersome and takes ages, and, worst of all, the codes more often than not end up running very slowly and generating more heat than good science.

Because the fuddy  daddies who got us into this morass are still at the helm of the sinking boat, you should not expect things to change any time soon. It's going to be more of ``steady as she goes'', until the DARPA project bears fruit and we get some new toys to play with.

This situation gave rise to some concern and pointed questions in the  Congress, where Vincent  Scarafino, the manager of numerically intensive computing at Ford Motor, commented: ``The federal government cannot rely on fundamental economic forces to advance high-performance computing capability. The federal government should help with the advancement of high-end processor design and other fundamental components necessary to develop well-balanced, highly capable machines. U.S. leadership is currently at risk.''

The chairman of the congressional committee that looked into these matters, Sherwood  Boehlert, a representative from New York, stated that ``Lethargy is setting in [at the  NSF] and I'm getting concerned. I don't want to be second to anybody.''


next up previous index
Next: IU HPC Facilities Up: Supercomputers and Clusters Previous: Supercomputers and Clusters
Zdzislaw Meglicki
2004-04-29