Bypass aware instruction scheduling for register file power reduction

Since register files suffer from some of the highest power densities within processors, designers have investigated several architectural strategies for register file power reduction, including "On Demand RF Read" where the register file is read only if the operand value is not available from the bypasses. However, we show in this paper that significant additional reductions in the register file power consumption can be obtained by scheduling instructions so that they transfer the operands via bypasses, rather than reading from the register file. Such instruction scheduling requires the compiler to be cognizant of the bypasses in the processor pipeline. In this paper, we develop several bypass aware instruction scheduling heuristics varying in time complexity, and study their effectiveness on the Intel XScale processor pipeline running MiBench benchmarks. Our experimental results show additional power consumption reductions of up to 26% and on average 12% over and above the register file power reduction achieved through existing techniques.

[1]  Rajesh Gupta,et al.  Profile-based dynamic voltage scheduling using program checkpoints , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[2]  Aviral Shrivastava,et al.  Operation tables for scheduling in the presence of incomplete bypassing , 2004, CODES+ISSS '04.

[3]  Mary Jane Irwin,et al.  An extended addressing mode for low power , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[4]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, MICRO.

[5]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[6]  Peter Marwedel,et al.  Analysis of the influence of register file size on energyconsumption, code size, and execution time , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Alexander V. Veidenbaum,et al.  Power-Aware Compilation for Register File Energy Reduction , 2004, International Journal of Parallel Programming.

[8]  Aviral Shrivastava,et al.  PBExplore: a framework for compiler-in-the-loop exploration of partial bypassing in embedded processors , 2005, Design, Automation and Test in Europe.

[9]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[10]  Jeffrey Deeney Thermal modeling and measurement of large high power silicon devices with asymmetric power distribution , 2002 .

[11]  Victor V. Zyuban,et al.  The energy complexity of register files , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[12]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[13]  T. N. Vijaykumar,et al.  Reducing register ports for higher speed and lower energy , 2002, MICRO.

[14]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[15]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a modulo schedule , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[16]  David Ruimy Gonzales Micro-RISC architecture for the wireless market , 1999, IEEE Micro.

[17]  Margaret Martonosi,et al.  Reducing Register File Power Consumption by Exploiting Value Lifetime Characteristics , 2000 .

[18]  Jihong Kim,et al.  Power-aware modulo scheduling for high-performance VLIW processors , 2001, ISLPED '01.

[19]  J.H. Tseng,et al.  Energy-efficient register access , 2000, Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843).

[20]  Carla E. Brodley,et al.  Heat stroke: power-density-based denial of service in SMT , 2005, 11th International Symposium on High-Performance Computer Architecture.

[21]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[22]  Trevor N. Mudge,et al.  Reducing register ports using delayed write-back queues and operand pre-fetch , 2003, ICS '03.