Dynamic IPC/clock rate optimization

Current microprocessor designs set the functionality and clock rate of the chip at design time based on the configuration that achieves the best overall performance over a range of target applications. The result may be poor performance when running applications whose requirements are not well-matched to the particular hardware organization chosen. We present a new approach called Complexity-Adaptive Processors (CAPs) in which the IPC/clock rate tradeoff can be altered at runtime to dynamically match the changing requirements of the instruction stream. By exploiting repeater methodologies used increasingly in deep sub-micron designs, CAPs achieve this flexibility with potentially no cycle time impact compared to a fixed architecture. Our preliminary results in applying this approach to on-chip caches and instruction queues indicate that CAPs have the potential to significantly outperform conventional approaches on workloads containing both general-purpose and scientific applications.

[1]  Peter A. Dinda,et al.  The CMU task parallel program suite , 1994 .

[2]  N. S. Barnett,et al.  Private communication , 1969 .

[3]  Ashok Kumar,et al.  The HP PA-8000 RISC CPU , 1997, IEEE Micro.

[4]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[5]  Doug Matzke,et al.  Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[6]  Allan Tzeng,et al.  UltraSPARC-II/: expanding the boundaries of a system on a chip , 1998, IEEE Micro.

[7]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8]  Mikko H. Lipasti,et al.  Superspeculative Microarchitecture for Beyond AD 2000 , 1997, Computer.

[9]  André DeHon,et al.  MATRIX: A reconfigurable computing device with configurable instruction distribution , 1997 .

[10]  Hewlett-Packard THE HP PA-8000 RISC CPU , 2022 .

[11]  J.D. Meindl,et al.  Optimal interconnection circuits for VLSI , 1985, IEEE Transactions on Electron Devices.

[12]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[13]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[14]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[15]  Yale N. Patt,et al.  One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.

[16]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[17]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Arif Merchant,et al.  Analysis of a Control Mechanism for a Variable Speed Processor , 1996, IEEE Trans. Computers.

[19]  J. Meindl,et al.  Optimal interconnect circuits for VLSI , 1984, 1984 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[20]  William J. Bowhill,et al.  Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU , 1995, Digit. Tech. J..

[21]  K JainAnil,et al.  Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor , 1995 .

[22]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .