Architectural and compiler techniques for energy reduction in high-performance microprocessors

In this paper, we focus on low-power design techniques for high-performance processors at the architectural and compiler levels. We focus mainly on developing methods for reducing the energy dissipated in the on-chip caches. Energy dissipated in caches represents a substantial portion in the energy budget of today's processors. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. We propose a method that uses an additional minicache located between the I-Cache and the central processing unit (CPU) core and buffers instructions that are nested within loops and are continuously otherwise fetched from the I-Cache. This mechanism is combined with code modifications, through the compiler, that greatly simplify the required hardware, eliminate unnecessary instruction fetching, and consequently reduce signal switching activity and the dissipated energy. We show that the additional cache, dubbed L-Cache, is much smaller and simpler than the I-Cache when the compiler assumes the role of allocating instructions to it. Through simulation, we show that for the SPECfp95 benchmarks, the I-Cache remains disabled most of the time, and the "cheaper" extra cache is used instead. We also propose different techniques that are better adapted to nonnumeric nonloop-intensive code.

[1]  Ibrahim N. Hajj,et al.  Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[2]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[4]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[5]  Daniel W. Dobberpuhl,et al.  The Design of a High Performance Low Power Microprocessor , 1996, ESSCIRC '96: Proceedings of the 22nd European Solid-State Circuits Conference.

[6]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[7]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Andrew Wolfe,et al.  Compilation techniques for low energy: an overview , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[9]  Sharad Malik,et al.  Instruction level power analysis and optimization of software , 1996, Proceedings of 9th International Conference on VLSI Design.

[10]  Ibrahim N. Hajj,et al.  An analytical, transistor-level energy model for SRAM-based caches , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[11]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[12]  Hugo De Man,et al.  Power exploration for data dominated video applications , 1996, ISLPED '96.

[13]  Mary Jane Irwin,et al.  Techniques for low energy software , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[14]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[15]  Hugo De Man,et al.  Formalized methodology for data reuse exploration in hierarchical memory mappings , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[16]  Raminder Singh Bajwa,et al.  Instruction buffering to reduce power in processors for signal processing , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[17]  Steven W. K. Tjiang,et al.  A 32b microprocessor with on-chip 2Kbyte instruction cache , 1987, 1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[18]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[19]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.