Precision selection for energy-efficient pixel shaders

In this work, we seek to realize energy savings in modern pixel shaders by reducing the precision of their arithmetic. We explore three schemes for controlling this reduction. The first is a static analysis technique, which analyzes shader programs to choose precision with guaranteed error bounds. This approach may be too conservative in practice since it cannot take advantage of run-time information, so we also examine two methods that take the actual data values into account - a programmer-directed approach and a closed-loop error-tracking approach, both of which can lead to higher savings. To use this last method, we developed several heuristics to control how the precisions will change over time. We simulate several series of frames from commercial applications to evaluate the performance of these different schemes. The average savings found by the static and dynamic approaches are 31%, 70%, and 62% in the pixel shader's arithmetic, respectively, which could result in as much as a 10--20% savings of the GPU's energy as a whole.

[1]  Yijun Liu,et al.  The design of a low power asynchronous multiplier , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[2]  Xiaomei Yang Rounding Errors in Algebraic Processes , 1964, Nature.

[3]  Rob A. Rutenbar,et al.  Reducing power by optimizing the necessary precision/range of floating-point arithmetic , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[4]  Anselmo Lastra,et al.  Power-Gated Arithmetic Circuits for Energy-Precision Tradeoffs in Mobile Graphics Processing Units , 2011, J. Low Power Electron..

[5]  Ki-Seok Chung,et al.  Design of Low Power MAC Operator with Dual Precision Mode , 2007, 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007).

[6]  Wayne Burleson,et al.  Dynamic wordlength variation for low-power 3D graphics texture mapping , 2003, 2003 IEEE Workshop on Signal Processing Systems (IEEE Cat. No.03TH8682).

[7]  Carlos González,et al.  ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[8]  Jonathan Su,et al.  Minimum triangle separation for correct z-buffer occlusion , 2006, GH '06.

[9]  Michael Thornton Wyman 4 – Half-Life 2 , 2011 .

[10]  Amitabh Varshney,et al.  Variable-precision rendering , 2001, I3D '01.

[11]  Earl E. Swartzlander,et al.  Power-delay characteristics of CMOS multipliers , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[12]  Milos D. Ercegovac,et al.  Two-dimensional signal gating for low-power array multiplier design , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[13]  Anselmo Lastra,et al.  Energy-precision tradeoffs in mobile Graphics Processing Units , 2008, 2008 IEEE International Conference on Computer Design.

[14]  Ki-Seok Chung,et al.  Low Power MAC Design with Variable Precision Support , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[15]  Anselmo Lastra,et al.  An energy model for graphics processing units , 2010, 2010 IEEE International Conference on Computer Design.