Speculative software management of datapath-width for energy optimization

This paper evaluates managing the processor's datapath-width at the compiler level by means of exploiting dynamic narrow-width operands. We capitalize on the large occurrence of these operands in multimedia programs to build static narrow-width regions that may be directly exposed to the compiler. We propose to augment the ISA with instructions directly exposing the datapath and the register widths to the compiler. Simple exception management allows this exposition to be only speculative. In this way, we permit the software to speculatively accommodate the execution of a program on a narrower datapath-width in order to save energy. For this purpose, we introduce a novel register file organization, the byte-slice register file, which allows the width of the register file to be dynamically reconfigured, providing both static and dynamic energy savings. We show that by combining the advantages of the byte-slice register file with the advantages provided by clock-gating the datapath on a per-region basis, up to 17% of the datapath dynamic energy can be saved, while a 22% reduction of the register file static energy is achieved.

[1]  Gilles Pokam,et al.  SWARP: a retargetable preprocessor for multimedia instructions , 2004, Concurr. Comput. Pract. Exp..

[2]  Krste Asanovic,et al.  Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[3]  François Charot,et al.  SALTO : System for Assembly-Language Transformation and Optimization , 1996 .

[4]  Gabriel H. Loh Exploiting data-width locality to increase superscalar execution bandwidth , 2002, MICRO 35.

[5]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[6]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[7]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, MICRO.

[8]  Alexander V. Veidenbaum,et al.  Energy aware register file implementation through instruction predecode , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[9]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[10]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[11]  Julien Sebot,et al.  Simd isa extensions: tradeoff between power consumption and performance on a superscalar processor , 2000 .

[12]  Mary Lou Soffa,et al.  Width-Sensitive Scheduling for Resource-Constrained VLIW Processors , 2000 .

[13]  James E. Smith,et al.  The ZS-1 central processor , 1987, ASPLOS 1987.

[14]  James E. Smith,et al.  Software-controlled operand-gating , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[15]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[16]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[17]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[18]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[19]  James E. Smith,et al.  Very low power pipelines using significance compression , 2000, MICRO 33.

[20]  Hiroto Yasuura,et al.  Low-Energy Design Using Datapath Width Optimization for Embedded Processor-Based Systems , 2002 .

[21]  Scott A. Mahlke,et al.  Bitwidth cognizant architecture synthesis of custom hardwareaccelerators , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[22]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[23]  Sameh W. Asaad,et al.  An innovative low-power high-performance programmable signal processor for digital communications , 2003, IBM J. Res. Dev..

[24]  Kevin Skadron,et al.  HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[25]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[26]  Mark Stephenson,et al.  Bidwidth analysis with application to silicon compilation , 2000, PLDI '00.