The Future of Computing Performance: Game Over or Next Level?

The end of dramatic exponential growth in single-processor performance marks the end of the dominance of the single microprocessor in computing. The era of sequential computing must give way to a new era in which parallelism is at the forefront. Although important scientific and engineering challenges lie ahead, this is an opportune time for innovation in programming systems and computing architectures. We have already begun to see diversity in computer designs to optimize for such considerations as power and throughput. The next generation of discoveries is likely to require advances at both the hardware and software levels of computing systems. There is no guarantee that we can make parallel computing as common and easy to use as yesterday's sequential single-processor computer systems, but unless we aggressively pursue efforts suggested by the recommendations in this book, it will be "game over" for growth in computing performance. If parallel programming and related software efforts fail to become widespread, the development of exciting new applications that drive the computer industry will stall; if such innovation stalls, many other parts of the economy will follow suit. The Future of Computing Performance describes the factors that have led to the future limitations on growth for single processors that are based on complementary metal oxide semiconductor (CMOS) technology. It explores challenges inherent in parallel computing and architecture, including ever-increasing power consumption and the escalated requirements for heat dissipation. The book delineates a research, practice, and education agenda to help overcome these challenges. The Future of Computing Performance will guide researchers, manufacturers, and information technology professionals in the right direction for sustainable growth in computer performance, so that we may all enjoy the next level of benefits to society.

[1]  J. W. Backus,et al.  The FORTRAN automatic coding system , 1899, IRE-AIEE-ACM '57 (Western).

[2]  D. B. Lomet Process structuring, synchronization, and recovery using atomic actions , 1977 .

[3]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Yuan Taur,et al.  Fundamentals of Modern VLSI Devices , 1998 .

[5]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[6]  K. Parhi,et al.  Synthesis of low power CMOS VLSI circuits using dual supply voltages , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[7]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[8]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[9]  Edward J. Nowak,et al.  Maintaining the benefits of CMOS scaling when scaling bogs down , 2002, IBM J. Res. Dev..

[10]  Yvon Savaria,et al.  Methods for minimizing dynamic power consumption in synchronous designs with multiple supply voltages , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  M. Horowitz,et al.  How scaling will change processor architecture , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[13]  Robert Broderson A Conversation with Teresa Meng , 2004, ACM Queue.

[14]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[15]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[16]  S. Tam,et al.  A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[17]  Edward A. Lee The problem with threads , 2006, Computer.

[18]  Dejan Markovic,et al.  Power and Area Minimization for Multidimensional Signal Processing , 2007, IEEE Journal of Solid-State Circuits.

[19]  Li Zhao,et al.  CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[20]  Christoforos E. Kozyrakis,et al.  From chaos to QoS: case studies in CMP resource management , 2007, CARN.

[21]  J. P. Grossman,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2008, CACM.

[22]  Leonid Oliker,et al.  Towards Ultra-High Resolution Models of Climate and Weather , 2008, Int. J. High Perform. Comput. Appl..

[23]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.