Coming challenges in microarchitecture and architecture

In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and in turn, accelerating additional software and hardware development that fueled this process even more. The technology that enabled this exponential growth is a combination of advancements in process technology, microarchitecture, architecture, and design and development tools. While the pace of this progress has been quite impressive over the last two decades, it has become harder and harder to keep up this pace. New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. Furthermore, as CMOS technology continues to advance, microprocessor design is exposed to a new set of challenges. In the near future, microarchitecture has to consider and explicitly manage the limits of semiconductor technology, such as wire delays, power dissipation, and soft errors. In this paper we describe the role of microarchitecture in the computer world present the challenges ahead of us, and highlight areas where microarchitecture can help address these challenges.

[1]  Stéphan Jourdan,et al.  Speculation techniques for improving load related instruction scheduling , 1999, ISCA.

[2]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[3]  Yale N. Patt,et al.  One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.

[4]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[5]  Stéphan Jourdan,et al.  A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[6]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[7]  S. McFarling Combining Branch Predictors , 1993 .

[8]  Leon Lantz,et al.  Soft errors induced by alpha particles , 1996, IEEE Trans. Reliab..

[9]  B. Ramakrishna Rau,et al.  EPIC: Explicititly Parallel Instruction Computing , 2000, Computer.

[10]  Avtar Saini,et al.  Design of the Intel Pentium processor , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[11]  Huy Nguyen,et al.  AltiVec/sup TM/: bringing vector technology to the PowerPC/sup TM/ processor family , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[12]  L. Geppert,et al.  Transmeta's magic show [microprocessor chips] , 2000 .

[13]  P. Bai,et al.  A high performance 180 nm generation logic technology , 1998, International Electron Devices Meeting 1998. Technical Digest (Cat. No.98CH36217).

[14]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[15]  Stamatis Vassiliadis,et al.  Architectural effects on dual instruction issue with interlock collapsing ALUs , 1993, Proceedings of Phoenix Conference on Computers and Communications.

[16]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[17]  Yoav Almog,et al.  eXtended block cache , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[18]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[19]  Stéphan Jourdan,et al.  Early load address resolution via register tracking , 2000, ISCA '00.

[20]  Trevor Mudge,et al.  The role of adaptivity in two-level adaptive branch prediction , 1995, MICRO 1995.

[21]  Andrew F. Glew MLP yes! ILP no , 1998, ASPLOS 1998.

[22]  David B. Papworth Tuning the Pentium Pro microarchitecture , 1996, IEEE Micro.

[23]  Andreas Moshovos,et al.  Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[24]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[25]  Shih-Lien Lu,et al.  Non-stalling counterflow architecture , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[26]  R. Allmon,et al.  High-performance microprocessor design , 1998, IEEE J. Solid State Circuits.

[27]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[28]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a module schedule , 1995, MICRO 1995.

[29]  Steven A. Przybylski,et al.  Cache and memory hierarchy design: a performance-directed approach , 1990 .

[30]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[31]  M. Bohr Interconnect scaling-the real limiter to high performance ULSI , 1995, Proceedings of International Electron Devices Meeting.

[32]  Bruce D. Shriver,et al.  The anatomy of a high-performance microprocessor - a systems perspective , 1998 .

[33]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[34]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[35]  Avi Mendelson,et al.  Performance and hardware complexity tradeoffs in designing multithreaded architectures , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[36]  John Paul Shen,et al.  The block-based trace cache , 1999, ISCA.

[37]  J. Maiz,et al.  Alpha-SER modeling and simulation for sub-0.25 /spl mu/m CMOS technology , 1999, 1999 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.99CH36325).

[38]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[39]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[40]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[41]  Yale N. Patt,et al.  Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[42]  M.J. Flynn,et al.  Deep submicron microprocessor design issues , 1999, IEEE Micro.

[43]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[44]  Fred Weber,et al.  AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.

[45]  Avi Mendelson,et al.  Using value prediction to increase the power of speculative execution hardware , 1998, TOCS.

[46]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[47]  R. Senthinathan,et al.  A 600 MHz IA-32 microprocessor with enhanced data streaming for graphics and video , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[48]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[49]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .

[50]  Doug Matzke,et al.  Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[51]  Donald S. Fussell,et al.  Energy-efficient instruction set architecture for CMOS microprocessors , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[52]  Pius Ng,et al.  A comparision of superscalar and decoupled access/execute architectures , 1993, MICRO 1993.

[53]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[54]  Thomas D. Burd,et al.  The simulation and evaluation of dynamic voltage scaling algorithms , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[55]  Yale N. Patt,et al.  An effective programmable prefetch engine for on-chip caches , 1995, MICRO 1995.

[56]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[57]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[58]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[59]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[60]  Kewal K. Saluja,et al.  A study of time-redundant fault tolerance techniques for high-performance pipelined computers , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[61]  Yale N. Patt,et al.  Alternative implementations of hybrid branch predictors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[62]  Yale N. Patt,et al.  Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[63]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[64]  Uri C. Weiser,et al.  Correlated load-address predictors , 1999, ISCA.

[65]  Joseph A. Fisher,et al.  Very Long Instruction Word architectures and the ELI-512 , 1983, ISCA '83.