论文信息 - Overview of recent supercomputers

Overview of recent supercomputers

62.1 The main architectural classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Shared-memory SIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Distributed-memory SIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Shared-memory MIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5 Distributed-memory MIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 ccNUMA machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7 Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.8 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.8.1 AMD Phenom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.8.2 IBM POWER6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.8.3 IBM PowerPC 970 processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.8.4 IBM BlueGene processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.8.5 Intel Itanium 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.8.6 Intel Xeon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.8.7 The MIPS processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.8.8 The SPARC processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.9 Computational accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.9.1 Graphical Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.9.2 General computational accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.9.3 FPGA-based accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.10 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.10.1 Inﬁniband . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.10.2 InﬁniPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.10.3 Myrinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.10.4 QsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

Aad J. van der Steen | A. Steen

[1] Torsten Suel,et al. BSPlib: The BSP programming library , 1998, Parallel Comput..

[2] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .

[3] Greg Wilson,et al. Past, Present, Parallel , 1991, Springer London.

[4] Jack J. Dongarra,et al. Performance of various computers using standard linear equations software in a Fortran environment , 1987, SGNM.

[5] Jean-Claude Bermond,et al. Large fault-tolerant interconnection networks , 1989, Graphs Comb..

[6] Clint Schow,et al. Get On The Optical Bus , 2010, IEEE Spectrum.

[7] Tom Shanley,et al. Infiniband Network Architecture , 2002 .

[8] Hans P. Zima,et al. The Earth Simulator , 2004, Parallel Comput..

[9] Greg Wilson,et al. "Past, Present, Parallel": A Survey Of Available Parallel Computer Systems , 1991 .

[10] Aad J. van der Steen. Evaluation of the Intel Clovertown Quad Core Processor , 2007 .

[11] Aad J. van der Steen. An evaluation of Itanium 2-based high-end servers , .

[12] Nicolai Petkov,et al. Aspects of Computational Science , 1995 .

[13] Sandia Report,et al. An Analysis of the Pathscale Inc. InfiniBand Host Channel Adapter, InfiniPath , 2005 .

[14] The Advantages of First-Generation HETEROGENEOUS COMPUTING on the Cray XT 5 h , 2008 .

[15] J. Dongarra. Performance of various computers using standard linear equations software , 1990, CARN.

[16] Jeffery A Kuehn,et al. An Analysis of HPCC Results on the Cray XT4 , 2007 .

[17] Aad J. van der Steen. The benchmark of the EuroBen group , 1991, Parallel Comput..

[18] J. Dongarra,et al. Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[19] Aad J. van der Steen. Benchmark results for the Hitachi S 3800 , 1993 .

[20] Larry Carter,et al. Performance and Programming Experience on the Tera MTA , 1999, PPSC.

[21] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .

[22] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[23] Wu-chun Feng,et al. The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[24] Shahid H. Bokhari,et al. Sequence alignment on the Cray MTA‐2 , 2004, Concurr. Pract. Exp..

[25] Hiroaki Ishihata,et al. AP1000 Architecture and Performance of LU Decomposition , 1991, International Conference on Parallel Processing.

[26] Ira Krepchin. Cray Research Inc. , 1993 .

[27] Harvey J. Wasserman,et al. A performance comparison of three supercomputers: Fujitsu VP-2600, NEC SX-3, and CRAY Y-MP , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[28] Stein Gjessing,et al. Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[29] Michael J. Flynn,et al. Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[30] Patrick H. Worley,et al. Early Evaluation of the Cray X1 , 2003, SC.

[31] Hiroshi Okano,et al. Sparc64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing , 2010, IEEE Micro.

[32] M. Lanzagorta,et al. Early Experience with Scientific Programs on the Cray MTA-2 , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[33] Anthony J. G. Hey,et al. The Genesis distributed memory benchmarks , 1991, Parallel Comput..

[34] Rice UniversityCORPORATE,et al. High performance Fortran language specification , 1993 .

[35] Mark Baker,et al. Cluster Computing White Paper , 2000, ArXiv.

[36] Jack Dongarra,et al. Pvm: A Users' Guide and Tutorial for Network Parallel Computing , 1994 .

[37] Scott Pakin,et al. A Performance Evaluation of an Alpha EV7 Processing Node , 2004, Int. J. High Perform. Comput. Appl..

[38] N. S. Barnett,et al. Private communication , 1969 .

[39] Aad J. van der Steen,et al. Benchmarking the Silicon Graphics Origin2000 System , 2000 .

[40] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[41] Qiang Li,et al. Local-Area MultiProcessor: the scalable coherent interface , 1994, Optics East.

[42] Alan L. Cox,et al. TreadMarks: shared memory computing on networks of workstations , 1996 .

[43] D. Bailey,et al. NAS Parallel Benchmark Results 1295 , 1993 .

[44] Aad J. van der Steen,et al. An Evaluation of Some Beowulf Clusters , 2003, Cluster Computing.

[45] Rohit Chandra,et al. Parallel programming in openMP , 2000 .

[46] Chris R. Jesshope,et al. Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[47] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[48] Jack J. Dongarra,et al. Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.

[49] T. Okamoto,et al. Parallel computer ADENART—its architecture and application , 1991, ICS '91.

[50] William J. Dally,et al. The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[51] David B. Gustavson,et al. Scalable Coherent Interface , 1990, COMPEURO'90: Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering@m_Systems Engineering Aspects of Complex Computerized Systems.

[52] Shahid H. Bokhari,et al. Sequence alignment on the Cray MTA-2 , 2003, Proceedings International Parallel and Distributed Processing Symposium.