论文信息 - Dynamic IPC/clock rate optimization

Dynamic IPC/clock rate optimization

Current microprocessor designs set the functionality and clock rate of the chip at design time based on the configuration that achieves the best overall performance over a range of target applications. The result may be poor performance when running applications whose requirements are not well-matched to the particular hardware organization chosen. We present a new approach called Complexity-Adaptive Processors (CAPs) in which the IPC/clock rate tradeoff can be altered at runtime to dynamically match the changing requirements of the instruction stream. By exploiting repeater methodologies used increasingly in deep sub-micron designs, CAPs achieve this flexibility with potentially no cycle time impact compared to a fixed architecture. Our preliminary results in applying this approach to on-chip caches and instruction queues indicate that CAPs have the potential to significantly outperform conventional approaches on workloads containing both general-purpose and scientific applications.

David H. Albonesi

[1] Peter A. Dinda,et al. The CMU task parallel program suite , 1994 .

[2] N. S. Barnett,et al. Private communication , 1969 .

[3] Ashok Kumar,et al. The HP PA-8000 RISC CPU , 1997, IEEE Micro.

[4] Michael J. Flynn,et al. An area model for on-chip memories and its application , 1991 .

[5] Doug Matzke,et al. Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[6] Allan Tzeng,et al. UltraSPARC-II/: expanding the boundaries of a system on a chip , 1998, IEEE Micro.

[7] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8] Mikko H. Lipasti,et al. Superspeculative Microarchitecture for Beyond AD 2000 , 1997, Computer.

[9] André DeHon,et al. MATRIX: A reconfigurable computing device with configurable instruction distribution , 1997 .

[10] Hewlett-Packard. THE HP PA-8000 RISC CPU , 2022 .

[11] J.D. Meindl,et al. Optimal interconnection circuits for VLSI , 1985, IEEE Transactions on Electron Devices.

[12] Kenneth C. Yeager. The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[13] Thorsten von Eicken,et al. 技術解説 IEEE Computer , 1999 .

[14] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[15] Yale N. Patt,et al. One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.

[16] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[17] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18] Arif Merchant,et al. Analysis of a Control Mechanism for a Variable Speed Processor , 1996, IEEE Trans. Computers.

[19] J. Meindl,et al. Optimal interconnect circuits for VLSI , 1984, 1984 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[20] William J. Bowhill,et al. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU , 1995, Digit. Tech. J..

[21] K JainAnil,et al. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor , 1995 .

[22] Norman P. Jouppi,et al. WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .