Overview of recent supercomputers

62.1 The main architectural classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Shared-memory SIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Distributed-memory SIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Shared-memory MIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5 Distributed-memory MIMD machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.6 ccNUMA machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7 Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.8 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.8.1 AMD Phenom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.8.2 IBM POWER6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.8.3 IBM PowerPC 970 processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.8.4 IBM BlueGene processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.8.5 Intel Itanium 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.8.6 Intel Xeon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.8.7 The MIPS processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.8.8 The SPARC processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.9 Computational accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.9.1 Graphical Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.9.2 General computational accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.9.3 FPGA-based accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.10 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.10.1 Infiniband . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.10.2 InfiniPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.10.3 Myrinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.10.4 QsNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

[1]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[2]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[3]  Greg Wilson,et al.  Past, Present, Parallel , 1991, Springer London.

[4]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a Fortran environment , 1987, SGNM.

[5]  Jean-Claude Bermond,et al.  Large fault-tolerant interconnection networks , 1989, Graphs Comb..

[6]  Clint Schow,et al.  Get On The Optical Bus , 2010, IEEE Spectrum.

[7]  Tom Shanley,et al.  Infiniband Network Architecture , 2002 .

[8]  Hans P. Zima,et al.  The Earth Simulator , 2004, Parallel Comput..

[9]  Greg Wilson,et al.  "Past, Present, Parallel": A Survey Of Available Parallel Computer Systems , 1991 .

[10]  Aad J. van der Steen Evaluation of the Intel Clovertown Quad Core Processor , 2007 .

[11]  Aad J. van der Steen An evaluation of Itanium 2-based high-end servers , .

[12]  Nicolai Petkov,et al.  Aspects of Computational Science , 1995 .

[13]  Sandia Report,et al.  An Analysis of the Pathscale Inc. InfiniBand Host Channel Adapter, InfiniPath , 2005 .

[14]  The Advantages of First-Generation HETEROGENEOUS COMPUTING on the Cray XT 5 h , 2008 .

[15]  J. Dongarra Performance of various computers using standard linear equations software , 1990, CARN.

[16]  Jeffery A Kuehn,et al.  An Analysis of HPCC Results on the Cray XT4 , 2007 .

[17]  Aad J. van der Steen The benchmark of the EuroBen group , 1991, Parallel Comput..

[18]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[19]  Aad J. van der Steen Benchmark results for the Hitachi S 3800 , 1993 .

[20]  Larry Carter,et al.  Performance and Programming Experience on the Tera MTA , 1999, PPSC.

[21]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[22]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[23]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[24]  Shahid H. Bokhari,et al.  Sequence alignment on the Cray MTA‐2 , 2004, Concurr. Pract. Exp..

[25]  Hiroaki Ishihata,et al.  AP1000 Architecture and Performance of LU Decomposition , 1991, International Conference on Parallel Processing.

[26]  Ira Krepchin Cray Research Inc. , 1993 .

[27]  Harvey J. Wasserman,et al.  A performance comparison of three supercomputers: Fujitsu VP-2600, NEC SX-3, and CRAY Y-MP , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[28]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[29]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[30]  Patrick H. Worley,et al.  Early Evaluation of the Cray X1 , 2003, SC.

[31]  Hiroshi Okano,et al.  Sparc64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing , 2010, IEEE Micro.

[32]  M. Lanzagorta,et al.  Early Experience with Scientific Programs on the Cray MTA-2 , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[33]  Anthony J. G. Hey,et al.  The Genesis distributed memory benchmarks , 1991, Parallel Comput..

[34]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[35]  Mark Baker,et al.  Cluster Computing White Paper , 2000, ArXiv.

[36]  Jack Dongarra,et al.  Pvm: A Users' Guide and Tutorial for Network Parallel Computing , 1994 .

[37]  Scott Pakin,et al.  A Performance Evaluation of an Alpha EV7 Processing Node , 2004, Int. J. High Perform. Comput. Appl..

[38]  N. S. Barnett,et al.  Private communication , 1969 .

[39]  Aad J. van der Steen,et al.  Benchmarking the Silicon Graphics Origin2000 System , 2000 .

[40]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[41]  Qiang Li,et al.  Local-Area MultiProcessor: the scalable coherent interface , 1994, Optics East.

[42]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[43]  D. Bailey,et al.  NAS Parallel Benchmark Results 1295 , 1993 .

[44]  Aad J. van der Steen,et al.  An Evaluation of Some Beowulf Clusters , 2003, Cluster Computing.

[45]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[46]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[47]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[48]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.

[49]  T. Okamoto,et al.  Parallel computer ADENART—its architecture and application , 1991, ICS '91.

[50]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[51]  David B. Gustavson,et al.  Scalable Coherent Interface , 1990, COMPEURO'90: Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering@m_Systems Engineering Aspects of Complex Computerized Systems.

[52]  Shahid H. Bokhari,et al.  Sequence alignment on the Cray MTA-2 , 2003, Proceedings International Parallel and Distributed Processing Symposium.