Low power GPGPU computation with imprecise hardware

Massively parallel computation in GPUs significantly boosts performance of compute-intensive applications but creates power and thermal issues that limit further performance scaling. This paper demonstrates significant GPGPU power savings by relaxing application accuracy requirements and enabling the use of low power imprecise hardware (IHW). A synthesized set of novel imprecise floating point arithmetic units is presented. GPGPU-Sim and GPUWattch are used to estimate impacts of IHW units on output quality and system-level power consumption, providing a quality-power tradeoff model for application-specific optimization. Experimental results for a 45 nm process show up to 32% power savings with negligible impacts on output quality.

[1]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[2]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  J. Stine,et al.  Variable-correction truncated floating point multipliers , 2000, Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154).

[4]  Tomás Lang,et al.  Digit-Serial Arithmetic , 2004 .

[5]  Andrew B. Kahng,et al.  Accuracy-configurable adder for approximate arithmetic designs , 2012, DAC Design Automation Conference 2012.

[6]  Asim J. Al-Khalili,et al.  A Low Power Approach to Floating Point Adder Design for DSP Applications , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[7]  Kaushik Roy,et al.  Low-Power Digital Signal Processing Using Approximate Adders , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Kiyoung Choi,et al.  Low power self-timed floating-point divider in 0.25um technology , 2000, Proceedings of the 26th European Solid-State Circuits Conference.

[9]  Keshab K. Parhi,et al.  Fast low-power shared division and square-root architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[10]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Peter J. Varman,et al.  Static window addition: A new paradigm for the design of variable latency adders , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[12]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[13]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  Armando J. Pinho,et al.  Figures of merit for quality assessment of binary edge maps , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[15]  Anselmo Lastra,et al.  Energy-precision tradeoffs in mobile Graphics Processing Units , 2008, 2008 IEEE International Conference on Computer Design.

[16]  Arindam Basu,et al.  Low Power Probabilistic Floating Point Multiplier Design , 2011, 2011 IEEE Computer Society Annual Symposium on VLSI.

[17]  Rob A. Rutenbar,et al.  Reducing power by optimizing the necessary precision/range of floating-point arithmetic , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Colas Schretter,et al.  Monte Carlo and Quasi-Monte Carlo Methods , 2016 .

[19]  Naresh R. Shanbhag,et al.  Reliable low-power digital signal processing via reduced precision redundancy , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Scott T. Acton,et al.  Speckle reducing anisotropic diffusion , 2002, IEEE Trans. Image Process..

[21]  Kaushik Roy,et al.  IMPACT: IMPrecise adders for low-power approximate computing , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[22]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[23]  Hang Zhang,et al.  Balancing Adder for error tolerant applications , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).