Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling

Traditional compiler approaches to optimize power efficiency aim to adjust voltage and frequency at runtime to match the code characteristics to the hardware (e.g., running memory-bound phases at a lower frequency). However, such approaches are constrained by three factors: (i) voltage-frequency transitions are too slow to be applied at instruction granularity, (ii) larger code regions are seldom unequivocally memory- or compute-bound, and, (iii) the available voltage scaling range for future technologies is rapidly shrinking. These factors necessitate new approaches to address power-efficiency at the code-generation level. This paper proposes one such approach to automatically generate power-efficient code using a decoupled access/execute (DAE) model. In DAE a program is split into tasks, where each task consists of two sufficiently coarse-grained phases to enable effective Dynamic Voltage Frequency Scaling (DVFS): (i) the access-phase for data prefetch (heavily memory-bound), and (ii) the execute-phase that performs the actual computation (heavily compute-bound). Our contribution is to provide a compiler methodology to automatically generate the access-phases for a task-based programming system. Our approach is capable of handling both affine (through a polyhedral analysis) and non-affine codes (through optimized task skeletons). Our evaluation shows that the automatically generated versions improve EDP by 25% on average compared to a coupled execution, without any performance degradation, and surpasses the EDP savings of the corresponding hand-crafted tasks by 5%.

[1]  Chuan-Qi Zhu,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Transactions on Software Engineering.

[2]  Pen-Chung Yew,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..

[3]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[4]  Josep Torrellas,et al.  An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  P. Clauss Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996, ICS '96.

[7]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[8]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[9]  Philip Levis,et al.  Policies for dynamic clock scheduling , 2000, OSDI.

[10]  Alan Jay Smith,et al.  Improving dynamic voltage scaling algorithms with PACE , 2001, SIGMETRICS '01.

[11]  Luca Benini,et al.  Dynamic voltage scaling and power management for portable systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[12]  Mahmut T. Kandemir,et al.  Energy-conscious compilation based on voltage scaling , 2002, LCTES/SCOPES '02.

[13]  Ricardo Bianchini,et al.  Application transformations for energy and performance-aware device management , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[14]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[15]  Juan Touriño,et al.  An Inspector-Executor Algorithm for Irregular Assignment Parallelization , 2004, ISPA.

[16]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[17]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[18]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[19]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  Margaret Martonosi,et al.  Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance , 2006, IEEE Micro.

[21]  Weifeng Zhang,et al.  Accelerating and Adapting Precomputation Threads for Effcient Prefetching , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[22]  Robert H. Dennard,et al.  Design of ion-implanted MOSFET's with very small physical dimensions , 2007 .

[23]  Weihua Sheng,et al.  The Design and Implementation of the DVS Based Dynamic Compiler for Power Reduction , 2007, APPT.

[24]  Stefanos Kaxiras,et al.  Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.

[25]  Uday Bondhugula,et al.  Loop transformations: convexity, pruning and optimization , 2011, POPL '11.

[26]  Dean M. Tullsen,et al.  Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.

[27]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[28]  Rachid Seghir ZPolyTrans : A library for computing and enumerating integer transformations of Z-Polyhedra , 2011 .

[29]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[30]  David Black-Schaffer,et al.  Towards more efficient execution: a decoupled access-execute approach , 2013, ICS '13.

[31]  Tomofumi Yuki,et al.  Folklore Confirmed: Compiling for Speed = Compiling for Energy , 2013, LCPC.

[32]  Rudolf Eigenmann,et al.  Compiler Infrastructure , 2013, International Journal of Parallel Programming.