Clock rate versus IPC: the end of the road for conventional microarchitectures

The doubling microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scaling of the processor dock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as semiconductor devices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance-estimating both clock rate and IPC-of an aggressive out-of-order microarchitecture as it is scaled from a 250 nm technology to a 35 nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scaling and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5% which is far worse than the annual 50-60% to which we have grown accustomed.

[1]  James R. Goodman,et al.  Hardware techniques to improve the performance of the processor/memory interface , 1998 .

[2]  Doug Matzke,et al.  Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[3]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[4]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[6]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7]  H. Shimizu,et al.  A 1.4 ns access 700 MHz 288 kb SRAM macro with expandable architecture , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[8]  James E. Smith,et al.  Optimal Pipelining in Supercomputers , 1986, ISCA.

[9]  N. P. van der Meijs,et al.  SPACE USER’S MANUAL , 2001 .

[10]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[11]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[12]  Stephen W. Keckler,et al.  The M-Machine multicomputer , 1995, MICRO 1995.

[13]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[14]  David H. Albonesi Dynamic IPC/clock rate optimization , 1998, ISCA.

[15]  M.A. Horowitz,et al.  Speed and power scaling of SRAM's , 2000, IEEE Journal of Solid-State Circuits.

[16]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[17]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[18]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[19]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[20]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[21]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[22]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[23]  Kurt Keutzer,et al.  Rethinking Deep-Submicron Circuit Design , 1999, Computer.

[24]  Hiroshi Shimizu,et al.  700MHz 288kb SRAM Macro with Expandable Architecture , 1999 .