Productive petascale computing: requirements, hardware, and software

Supercomputer designers traditionally focus on low-level hardware performance criteria such as CPU cycle speed, disk bandwidth, and memory latency. The High-Performance Computing (HPC) community has more recently begun to realize that escalating hardware performance is, by itself, contributing less and less to real productivity-the ability to develop and deploy high-performance supercomputer applications at acceptable time and cost. The Defense Advanced Research Projects Agency (DARPA) High Productivity Computing Systems (HPCS) initiative challenged industry vendors to design a new generation of supercomputers that would deliver a 10x improvement in this newly acknowledged but poorly understood domain of real productivity. Sun Microsystems, choosing to abandon customary evolutionary approaches, responded with two revolutionary decisions. The first was to investigate the nature of supercomputer productivity in the full context of use, which includes people, organizations, goals, practices, and skills as well as processors, disks, memory, and software. The second decision was to rethink completely the design of supercomputing systems, informed by productivity-based requirements and driven by recent technological breakthroughs. Crucial to the implementation of these decisions was the establishment of multidisciplinary, closely collaborating teams that conducted research into productivity and developed the many closely intertwined design decisions needed to meet DARPA's challenge. Among the most significant results from Sun's productivity research was a detailed diagnosis of software development as the dominant barrier to productivity improvements in the HPC community. The level of expertise required, combined with the amount of effort needed to develop conventional HPC codes, has already created a crisis of productivity. Even worse, there is no path forward within the existing paradigm that will significantly increase productivity as hardware systems scale up. The same issues also prevent HPC from "scaling out" to a broader class of applications. This diagnosis led to design requirements that address specific issues behind the expertise and effort bottlenecks. Sun's design teams explored complex, system-wide tradeoffs needed to meet these requirements in all aspects of the design, including reliability, performance, programmability, and ease of administration. These tradeoffs drew on technological advances in massive chip multithreading, extremely high-performance interconnects, resource virtualization, and programming language design. The outcome was the design for a machine to operate at petascale, with extremely high reliability and a greatly simplified programming model. Although this design supports existing codes and software technologies-crucial requirements-it also anticipates that the greatest productivity breakthroughs will follow from dramatic changes in how HPC codes are developed, changes that require a system of the type designed by Sun's HPCS team.

[1]  Victor Luchangco,et al.  The Fortress Language Specification Version 1.0 , 2007 .

[2]  J. Gustafson Purpose-Based Benchmarks , 2004, Int. J. High Perform. Comput. Appl..

[3]  Ilya Sharapov,et al.  Performance and Programmability Comparison Between OpenMP and MPI Implementations of a Molecular Modeling Application , 2005, IWOMP.

[4]  Xuezhe Zheng,et al.  Optical Transceiver Chips Based on Co-Integration of Capacitively Coupled Proximity Interconnects and VCSELs , 2007, IEEE Photonics Technology Letters.

[5]  G. R. Ruetsch,et al.  An interval algorithm for multi-objective optimization , 2005 .

[6]  Qin Zhang,et al.  Improving software development management through software project telemetry , 2005, IEEE Software.

[7]  Ravishankar K. Iyer,et al.  Application fault tolerance with Armor middleware , 2005, IEEE Internet Computing.

[8]  R. P. Kendall,et al.  Case study of the Falcon code project , 2005, SE-HPCS '05.

[9]  Sam Tobin-Hochstadt,et al.  A Core Calculus of Metaclasses , 2005 .

[10]  Pancake Cherri Establishing standards for HPC system software and tools , 1997 .

[11]  Y. Fainman,et al.  Inhomogenous dielectric metamaterials with space-variant polarizability. , 2007, Physical review letters.

[12]  Lawrence G. Votta,et al.  Software Productivity Research In High Performance Computing , 2006 .

[13]  John Blackstone,et al.  Theory of Constraints , 2010, Scholarpedia.

[14]  Kenny C. Gross,et al.  Low-Overhead Run-Time Memory Leak Detection and Recovery , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[15]  Xuezhe Zheng,et al.  CMOS Integration of Capacitive, Optical, and Electrical Interconnects , 2007, 2007 IEEE International Interconnect Technology Conferencee.

[16]  JaatunMartin Gilje,et al.  Agile Software Development , 2002, Comput. Sci. Educ..

[17]  David Vengerov,et al.  A Reinforcement Learning Approach to Dynamic Resource Allocation ∗ , 2005 .

[18]  Arvind,et al.  Store Atomicity for Transactional Memory , 2007, Electron. Notes Theor. Comput. Sci..

[19]  R. Ho,et al.  Electronic alignment for proximity communication , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[20]  Y. Fainman,et al.  Novel slab lens based on artificial graded index medium , 2005, 2005 OSA Topical Meeting on Information Photonics (IP).

[21]  Ilya Sharapov,et al.  Characteristics of workloads used in high performance and technical computing , 2007, ICS '07.

[22]  Douglass E. Post,et al.  Case Study of the Nene Code Project , 2010, Computing in Science & Engineering.

[23]  Jack Dongarra,et al.  Introduction to the HPCChallenge Benchmark Suite , 2004 .

[24]  Katherine Yelick,et al.  UPC Language Specifications V1.1.1 , 2003 .

[25]  Eldon Hansen,et al.  Solving Overdetermined Systems of Interval Linear Equations , 2006, Reliab. Comput..

[26]  Jeffrey C. Carver,et al.  Development of a Weather Forecasting Code: A Case Study , 2008, IEEE Software.

[27]  Kenny C. Gross,et al.  Early Detection of Signal and Process Anomalies in Enterprise Computing Systems , 2002, ICMLA.

[28]  Justin Schauer,et al.  High Speed and Low Energy Capacitively Driven On-Chip Wires , 2008, IEEE J. Solid State Circuits.

[29]  Jeffrey C. Carver,et al.  Software Development Environments for Scientific and Engineering Software: A Series of Case Studies , 2007, 29th International Conference on Software Engineering (ICSE'07).

[30]  Ravishankar K. Iyer,et al.  Modeling coordinated checkpointing for large-scale supercomputers , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[31]  David Vengerov,et al.  A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments , 2008, Future Gener. Comput. Syst..

[32]  D. E. Post,et al.  LARGE-SCALE COMPUTATIONAL SCIENTIFIC AND ENGINEERING CODE DEVELOPMENT AND PRODUCTION WORKFLOWS , 2007 .

[33]  Alok Choudhary,et al.  Exploiting Shared Memory to Improve Parallel I/O Performance , 2006, PVM/MPI.

[34]  David Vengerov,et al.  A reinforcement learning framework for online data migration in hierarchical storage systems , 2007, The Journal of Supercomputing.

[35]  Kathryn L. Kelley,et al.  Blue-Collar Computing: HPC for the Rest of Us , 2003 .

[36]  D. E. Post,et al.  HPC needs a tool strategy , 2005, SE-HPCS '05.

[37]  David A. Case,et al.  Second derivatives in generalized Born theory , 2006, J. Comput. Chem..

[38]  Walter F. Tichy,et al.  What Do Programmers of Parallel Machines Need? A Survey , 2005, HPCA 2005.

[39]  Ashok V. Krishnamoorthy,et al.  Challenges in building a flat-bandwidth memory hierarchy for a large-scale computer with proximity communication , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[40]  Robert Kroeger,et al.  A case study in top-down performance estimation for a large-scale parallel application , 2006, PPoPP '06.

[41]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[42]  Justin Schauer,et al.  An asynchronous high-throughput control circuit for proximity communication , 2006, 12th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC'06).

[43]  Jr. Frederick P. Brooks,et al.  The mythical man-month (anniversary ed.) , 1995 .

[44]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[45]  Judith Segal Some Problems of Professional End User Developers , 2007 .

[46]  Barbara M. Chapman,et al.  OpenMP , 2005, Parallel Comput..

[47]  H. Davidson,et al.  The chip-multithreading architecture and parallel optical interconnects , 2004, Digest of the LEOS Summer Topical Meetings Biophotonics/Optical Interconnects and VLSI Photonics/WBM Microcavities, 2004..

[48]  Eldon Hansen,et al.  Global optimization using interval analysis , 1992, Pure and applied mathematics.

[49]  Lawrence Votta,et al.  A system-wide productivity figure of merit , 2006 .

[50]  Ilya Sharapov,et al.  High-Scalability Parallelization of a Molecular Modeling Application: Performance and Productivity Comparison Between OpenMP and MPI Implementations , 2007, International Journal of Parallel Programming.

[51]  Ron Ho,et al.  Long wires and asynchronous control , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..

[52]  R. Drost,et al.  Challenges and potentials for multiterabit-per-second optical transceivers , 2004, Digest of the LEOS Summer Topical Meetings Biophotonics/Optical Interconnects and VLSI Photonics/WBM Microcavities, 2004..

[53]  David Vengerov,et al.  A reinforcement learning framework for utility-based scheduling in resource-constrained systems , 2009, Future Gener. Comput. Syst..

[54]  ReidJohn,et al.  Co-array Fortran for parallel programming , 1998 .

[55]  Arvind,et al.  Memory Model = Instruction Reordering + Store Atomicity , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[56]  G.K. Konstadinidis,et al.  Architecture and Physical Implementation of a Third Generation 65 nm, 16 Core, 32 Thread Chip-Multithreading SPARC Processor , 2009, IEEE Journal of Solid-State Circuits.

[57]  Ashok V. Krishnamoorthy,et al.  BGA package integration of electrical, optical, and capacitive interconnects , 2009, 2009 59th Electronic Components and Technology Conference.

[58]  Lawrence G. Votta,et al.  Yes, There Is an "Expertise Gap" In HPC Applications Development , 2006 .

[59]  Victor Luchangco,et al.  Object-oriented units of measurement , 2004, OOPSLA.

[60]  Daniel Hoffman,et al.  Software product lines: a case study , 2000, Softw. Pract. Exp..

[61]  John Howse,et al.  Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2006), Brighton, United Kingdom, 04-08 September 2006 , 2006 .

[62]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[63]  Ramadan Am Our government and its appetite for special favors. , 1989 .

[64]  Marc Tremblay,et al.  A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[65]  R. Yin Case Study Research: Design and Methods , 1984 .

[66]  Ashok V. Krishnamoorthy,et al.  Optical Interconnects for High-Productivity Computing Systems , 2005 .

[67]  Walter F. Tichy,et al.  Measuring High Performance Computing Productivity , 2004, Int. J. High Perform. Comput. Appl..

[68]  Ron Ho,et al.  Measuring 6D Chip Alignment in Multi-Chip Packages , 2007, 2007 IEEE Sensors.

[69]  Alan Wood,et al.  RAS by the Yard , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[70]  G. William Walster,et al.  Using Pillow Functions to Efficiently Compute Crude Range Tests , 2004, Numerical Algorithms.

[71]  R. Ho,et al.  Proximity Communication flip-chip package with micron chip-to-chip alignment tolerances , 2009, 2009 59th Electronic Components and Technology Conference.

[72]  Cherri M. Pancake A Collaborative Effort in Parallel Tool Design , 1994 .

[73]  John E Cunningham,et al.  Scaling vertical-cavity surface-emitting laser reliability for petascale systems. , 2006, Applied optics.

[74]  David A. Bader Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems , 2006 .

[75]  Andrew Over,et al.  Working Set Characterization of Applications with an Efficient LRU Algorithm , 2006, EPEW.

[76]  Steven J. Deitz,et al.  A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[77]  Justin Schauer,et al.  Circuit Techniques to Enable 430Gb/s/mm2 Proximity Communication , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[78]  Alan F. Murray,et al.  IEEE International Solid-State Circuits Conference , 2001 .

[79]  Katherine Yelick,et al.  Titanium Language Reference Manual , 2001 .

[80]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[81]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[82]  Jeremy Kepner High Performance Computing Productivity Model Synthesis , 2004, Int. J. High Perform. Comput. Appl..

[83]  Walter F. Tichy,et al.  Measuring HPC productivity , 2004 .

[84]  D. Post,et al.  Computational Science Demands a New Paradigm , 2005 .

[85]  Edsger W. Dijkstra,et al.  The structure of the “THE”-multiprogramming system , 1968, CACM.

[86]  R. Ho,et al.  Proximity communication , 2004, IEEE Journal of Solid-State Circuits.

[87]  Nicholas Bambos,et al.  Adaptive data-aware utility-based scheduling in resource-constrained systems , 2010, J. Parallel Distributed Comput..

[88]  R. Engelbrecht,et al.  DIGEST of TECHNICAL PAPERS , 1959 .

[89]  Scaling VCSEL reliability up to 250Terabits/s of system bandwidth , 2005, 2005 OSA Topical Meeting on Information Photonics (IP).

[90]  Philip M. Johnson,et al.  Understanding HPC Development through Automated Process and Product Measurement with Hackystat , 2005 .

[91]  Lawrence G. Votta,et al.  Can software engineering solve the HPCS problem? , 2005, SE-HPCS '05.

[92]  Jeremy Kepner HPC Productivity: An Overarching View , 2004, Int. J. High Perform. Comput. Appl..

[93]  H. Bernard,et al.  Handbook of Methods in Cultural Anthropology , 2000 .

[94]  Ashok V. Krishnamoorthy,et al.  Scaling VCSEL performance for 100terabits/s systems , 2006, SPIE OPTO.