论文信息 - Merrimac: Supercomputing with Streams

Merrimac: Supercomputing with Streams

Merrimac uses stream architecture and advanced interconnection networks to give an order of magnitude more performance per unit cost than cluster-based scientific computers built from the same technology. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative applications by an order of magnitude or more. Hence a processing node with a fixed bandwidth (expensive) can support an order of magnitude more arithmetic units (inexpensive). This in turn allows a given level of performance to be achieved with fewer nodes (a 1-PFLOPS machine, for example, with just 8,192 nodes) resulting in greater reliability, and simpler system management. We sketch the design of Merrimac, a streaming scientific computer that can be scaled from a $20K 2 TFLOPS workstation to a $20M 2 PFLOPS supercomputer and present the results of some initial application experiments on this architecture.

[1] Charles Clos,et al. A study of non-blocking switching networks , 1953 .

[2] A. Jameson. ANALYSIS AND DESIGN OF NUMERICAL SCHEMES FOR GAS DYNAMICS, 1: ARTIFICIAL DIFFUSION, UPWIND BIASING, LIMITERS AND THEIR EFFECT ON ACCURACY AND MULTIGRID CONVERGENCE , 1995 .

[3] Jérôme Jaffré,et al. CONVERGENCE OF THE DISCONTINUOUS GALERKIN FINITE ELEMENT METHOD FOR HYPERBOLIC CONSERVATION LAWS , 1995 .

[4] Eric F Darve,et al. Calculating free energies using average force , 2001 .

[5] Timothy J. Barth,et al. Simplified Discontinuous Galerkin Methods for Systems of Conservation Laws with Convex Extension , 2000 .

[6] William J. Dally,et al. Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[7] Harvey J. Wasserman,et al. A performance comparison of four supercomputers , 1992, CACM.

[8] Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 15-21 November 2003, Phoenix, AZ, USA, CD-Rom , 2003 .

[9] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.

[10] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.

[11] William J. Dally,et al. Scalable opto-electronic network (SOENet) , 2002, Proceedings 10th Symposium on High Performance Interconnects.

[12] Steven L. Scott,et al. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[13] Juan J. Alonso,et al. StreamFLO: an Euler solver for streaming architectures , 2004 .

[14] William J. Dally,et al. Exploring the VLSI scalability of stream processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[15] A. Jameson. ANALYSIS AND DESIGN OF NUMERICAL SCHEMES FOR GAS DYNAMICS, 2: ARTIFICIAL DIFFUSION AND DISCRETE SHOCK STRUCTURE , 1994 .

[16] Yannis Kallinderis,et al. Generic parallel adaptive-grid Navier-Stokes algorithm , 1994 .

[17] W. H. Reed,et al. Triangular mesh methods for the neutron transport equation , 1973 .

[18] Chi-Wang Shu,et al. The Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. IV. The multidimensional case , 1990 .

[19] William J. Dally,et al. Digital systems engineering , 1998 .

[20] William J. Dally,et al. Programmable Stream Processors , 2003, Computer.

[21] Eric Darve,et al. Calculating Free Energies Using a Scaled-Force Molecular Dynamics Algorithm , 2002 .

[22] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.

[23] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.