The future of microprocessors

Energy efficiency is the new fundamental limiter of processor performance, way beyond numbers of processors.

[1]  Kenichiro Noguchi,et al.  Design considerations for a heterogeneous tightly-coupled multiprocessor system , 1899 .

[2]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[3]  Gordon Bell,et al.  C.mmp: a multi-mini-processor , 1972, AFIPS '72 (Fall, part II).

[4]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[5]  Hiroshi Morita,et al.  Design considerations for a heterogeneous tightly-coupled multiprocessor system , 1975, AFIPS '75.

[6]  John Tartar,et al.  Multiprocessor hardware: An architectural overview , 1980, ACM '80.

[7]  William D. Strecker,et al.  Transient behavior of cache memories , 1983, TOCS.

[8]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[9]  J. Hennessy,et al.  Characteristics of performance-optimal multi-level cache hierarchies , 1989, ISCA '89.

[10]  Andrew A. Chien,et al.  A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[11]  Stanley Mazor,et al.  The history of the microcomputer-invention and evolution , 1995, Proc. IEEE.

[12]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[13]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[14]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[15]  David Thomas,et al.  Programming Ruby: the pragmatic programmer's guide , 2000 .

[16]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[17]  Sriram R. Vangal,et al.  A TCP offload accelerator for 10 Gb/s Ethernet in 90-nm CMOS , 2003, IEEE J. Solid State Circuits.

[18]  Philip Heidelberger,et al.  IBM Research Report Design and Analysis of the BlueGene/L Torus Interconnection Network , 2003 .

[19]  Sriram R. Vangal,et al.  A 10GHz TCP offload accelerator for 10Gb/s Ethernet in 90nm dual-V/sub T/ CMOS , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[20]  Mahadev Satyanarayanan,et al.  Managing battery lifetime with energy-aware adaptation , 2004, TOCS.

[21]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[22]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[23]  James Gosling,et al.  The Java Language Specification, 3rd Edition , 2005 .

[24]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[25]  Andrew A. Chien Pervasive parallel computing: an historic opportunity for innovation in programming and architecture , 2007, PPOPP.

[26]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[27]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[28]  Sanu Mathew,et al.  A 320mV 56μW 411GOPS/Watt Ultra-Low Voltage Motion Estimation Accelerator in 65nm CMOS , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[29]  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[30]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[31]  Vivek Sarkar,et al.  Software challenges in extreme scale systems , 2009 .

[32]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[33]  S. Borkar,et al.  A 320 mV 56 μW 411 GOPS/Watt Ultra-Low Voltage Motion Estimation Accelerator in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.

[34]  Sanu Mathew,et al.  A 4.1Tb/s bisection-bandwidth 560Gb/s/W streaming circuit-switched 8×8 mesh network-on-chip in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[35]  Kunle Olukotun,et al.  Ubiquitous Parallel Computing from Berkeley, Illinois, and Stanford , 2010, IEEE Micro.

[36]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[37]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[38]  Andrew A. Chien,et al.  10x10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency , 2011, ICCS.