Reducing dynamic and leakage energy in VLIW architectures

The mobile computing device market has been growing rapidly. This brings the technologies that optimize system energy to the forefront. As circuits continue to scale in the future, it would be important to optimize both leakage and dynamic energy. Effective optimization of leakage and dynamic energy consumption requires a vertical integration of techniques spanning from circuit to software levels. Schedule slacks in codes executing in VLIW architectures present an opportunity for such an integration. In this paper, we present three compiler-directed techniques that take advantage of schedule slacks to optimize leakage and dynamic energy consumption. Integer ALU (IALU) components operating with multiple supply voltages are designed to provide different low-energy versions that possess different operational latencies. The goal of the first technique explored is to maximize the number of operations mapped to IALU components with the lowest energy consumption without extending the schedule length. We also consider a variant of this technique that saves more energy at the cost of some performance loss. The second technique uses two leakage-control mechanisms to reduce leakage energy consumption when no operations are scheduled in the component. Our evaluation of these two approaches, using fifteen benchmarks, shows that based on the number and duration of slacks, the availability of low-energy functional units and the relative magnitude of leakage and dynamic energy, either leakage or dynamic energy consumption, will provide more energy gains. Finally, we provide a unified energy-optimization strategy that integrates both dynamic and leakage energy-reduction schemes. The proposed techniques have been incorporated into a cycle accurate simulator using parameters extracted from circuit-level simulation. Our results show that the unified scheme generates better results than using either of dynamic and leakage energy-reduction techniques independently.

[1]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[2]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[3]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[4]  Mary Jane Irwin,et al.  Area-time-power tradeoffs in parallel adders , 1996 .

[5]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[6]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[7]  Vittorio Zaccaria,et al.  Exploiting data forwarding to reduce the power budget of VLIW embedded processors , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[8]  Mark C. Johnson,et al.  Models and algorithms for bounds on leakage in CMOS circuits , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[9]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Kaushik Roy,et al.  Low-Power CMOS VLSI Circuit Design , 2000 .

[11]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[12]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[13]  Farid N. Najm,et al.  A gate-level leakage power reduction method for ultra-low-power CMOS circuits , 1997, Proceedings of CICC 97 - Custom Integrated Circuits Conference.

[14]  Massoud Pedram,et al.  Energy Minimization Using Multiple Supply Voltages , 1997, ISLPED.

[15]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[16]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[17]  Mahmut T. Kandemir,et al.  Tuning garbage collection in an embedded Java environment , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[18]  Tadahiro Kuroda,et al.  Threshold-Volgage control schemes through substrate-bias for low-power high-speed CMOS LSI design , 1996, J. VLSI Signal Process..

[19]  Mahmut T. Kandemir,et al.  A framework for energy estimation of VLIW architecture , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[20]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[21]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[22]  Hiroto Yasuura,et al.  Real-time task scheduling for a variable voltage processor , 1999, Proceedings 12th International Symposium on System Synthesis.

[23]  Gustavo de Veciana,et al.  Heuristic tradeoffs between latency and energy consumption in register assignment , 2000, CODES '00.

[24]  Narayanan Vijaykrishnan,et al.  Evaluating run-time techniques for leakage power reduction , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[25]  Mahmut T. Kandemir,et al.  DRAM energy management using software and hardware directed power mode control , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[26]  Anantha P. Chandrakasan,et al.  Low Power Digital CMOS Design , 1995 .

[27]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[28]  Jui-Ming Chang,et al.  Energy Minimization Using Multiple Supply Voltages , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[29]  Rajendran Panda,et al.  Stand-by power minimization through simultaneous threshold voltage selection and circuit sizing , 1999, DAC '99.

[30]  Jihong Kim,et al.  Power-aware modulo scheduling for high-performance VLIW processors , 2001, ISLPED '01.

[31]  Naehyuck Chang,et al.  An operation rearrangement technique for power optimization in VLIW instruction fetch , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[32]  F. Catthoor,et al.  Interaction between sub-word parallelism exploitation and low power code transformations for VLIW multi-media processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[33]  H. Yasuura,et al.  Functional redundancy for dynamic exploitation of performance-energy consumption trade-offs , 2000, Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843).

[34]  Majid Sarrafzadeh,et al.  Variable voltage scheduling , 1995, ISLPED '95.

[35]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, ISLPED '99.

[36]  Vittorio Zaccaria,et al.  Power exploration for embedded VLIW architectures , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[37]  Luca Benini,et al.  Operating-system directed power reduction , 2000, ISLPED '00.