Reducing power by optimizing the necessary precision/range of floating-point arithmetic

Low-power systems often find the power cost of floating-point (FP) hardware prohibitively expensive. This paper explores ways of reducing FP power consumption by minimizing the bitwidth representation of FP data. Analysis of several FP programs that manipulate low-resolution human sensory data shows that these programs suffer no loss of accuracy even with a significant reduction in bitwidth. Most FP programs in our benchmark suite maintain the same output even when the mantissa bitwidth is reduced by half. This FP bitwidth reduction can deliver a significant power saving through the use of a variable bitwidth FP unit. Our results show that up to 66% reduction in multiplier energy/operation can be achieved in the FP unit by this bitwidth reduction technique without sacrificing any program accuracy.

[1]  Mei-Yuh Hwang,et al.  Improving speech recognition performance via phone-dependent VQ codebooks and adaptive language models in SPHINX-II , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  T. R. Huff,et al.  A High Performance GaAs Microprocessor , 1993, 1993 IEEE Princeton Section Sarnoff Symposium.

[3]  Tomás Lang,et al.  Exploiting the locality of memory references to reduce the address bus energy , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[4]  Mohamed I. Elmasry,et al.  Circuit techniques for CMOS low-power high-performance multipliers , 1996 .

[5]  Earl E. Swartzlander,et al.  Power-delay characteristics of CMOS multipliers , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[6]  B. Ackland,et al.  A new generation of DSP architectures , 1999, Proceedings of the IEEE 1999 Custom Integrated Circuits Conference (Cat. No.99CH36327).

[7]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[8]  E. Friedman,et al.  A hybrid radix-4/madix-8 low power signed multiplier architecture , 1997 .

[9]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[10]  Larsson,et al.  Self-adjusting Bit-precision For Low-power Digital Filters , 1997, Symposium 1997 on VLSI Circuits.

[11]  S. F. Anderson,et al.  The IBM system/360 model 91: floating-point execution unit , 1967 .

[12]  Farid N. Najm,et al.  McPOWER: a Monte Carlo approach to power estimation , 1992, 1992 IEEE/ACM International Conference on Computer-Aided Design.

[13]  R. Hartley,et al.  Digit-Serial Computation , 1995 .

[14]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[15]  Dan Dobberpuhl The design of a high performance low power microprocessor , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.

[16]  Asim J. Al-Khalili,et al.  Energy delay measures of barrel switch architectures for pre-alignment of floating point operands for addition , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[17]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[18]  Paul D. Franzon,et al.  Low power data processing by elimination of redundant computations , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[19]  Larry Rudolph,et al.  Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.

[20]  Herman Schmit,et al.  A low-power 16-bit multiplier-accumulator using series-regulated mixed swing techniques , 1998, Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No.98CH36143).

[21]  Chi-Ying Tsui,et al.  Low power motion estimation design using adaptive pixel truncation , 1997, ISLPED '97.

[22]  Keshab K. Parhi,et al.  Design and implementation of low-power digit-serial multipliers , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[23]  Mark Horowitz,et al.  Rounding algorithms for IEEE multipliers , 1989, Proceedings of 9th Symposium on Computer Arithmetic.

[24]  Mary Jane Irwin,et al.  Power comparisons for barrel shifters , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.

[25]  Cesare Alippi,et al.  Accuracy vs. Precision in Digital VLSI Architectures for Signal Processing , 1998, IEEE Trans. Computers.

[26]  Rob A. Rutenbar,et al.  Exploring multiplier architecture and layout for low power , 1996, Proceedings of Custom Integrated Circuits Conference.