Modeling and analyzing the energy consumption of fork‐join‐based task parallel programs

Because of environmental and monetary concerns, it is increasingly important to reduce the energy consumption in all areas, including parallel and high performance computing. In this article, we propose an approach to reduce the energy consumption needed for the execution of a set of tasks computed in parallel in a fork‐join fashion. The approach consists of an analytical model for the energy consumption of a parallel computation in fork‐join form on dynamic voltage frequency scaling processors, a theoretical specification of an energy‐optimal frequency‐scaled state, and the energy minimization by computing optimal scaling factors. For larger numbers of tasks, the approach is extended by scheduling algorithms, which exploit the analytical result and aim at a reduction of the energy. Energy measurements of a complex numerical method and the SPEC CPU2006 benchmarks as well as simulations for a large number of randomly generated tasks illustrate and validate the energy modeling, the minimization, and the scheduling results. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[2]  Gerhard Wellein,et al.  LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[3]  Alan D. George,et al.  RapidIO for radar processing in advanced space systems , 2007, TECS.

[4]  Michael T. Goodrich,et al.  Fundamental parallel algorithms for private-cache chip multiprocessors , 2008, SPAA '08.

[6]  Marek Chrobak,et al.  Algorithmic Aspects of Energy-Efficient Computing , 2012, Handbook of Energy-Aware and Green Computing.

[7]  M.R. Greenstreet,et al.  Computation with Energy-Time Trade-Offs: Models, Algorithms and Lower-Bounds , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[8]  Eric Saxe,et al.  Power-efficient software , 2010, Commun. ACM.

[9]  Gul A. Agha,et al.  Towards optimizing energy costs of algorithms for shared memory architectures , 2010, SPAA '10.

[10]  Henri Casanova,et al.  A Comparison of Scheduling Approaches for Mixed-Parallel Applications on Heterogeneous Platforms , 2007, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07).

[11]  Kirk Pruhs,et al.  Speed scaling to manage energy and temperature , 2007, JACM.

[12]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[13]  Xue Liu,et al.  Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control , 2007, IEEE Transactions on Computers.

[14]  Susanne Albers,et al.  Speed scaling on parallel processors , 2007, SPAA.

[15]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[16]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[17]  Sriram Krishnamoorthy,et al.  Scioto: A Framework for Global-View Task Parallelism , 2008, 2008 37th International Conference on Parallel Processing.

[18]  Thomas Rauber,et al.  Compiler support for task scheduling in hierarchical execution models , 1999, J. Syst. Archit..

[19]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[20]  Thomas Rauber,et al.  Adaptive Task Pools: Efficiently Balancing Large Number of Tasks on Shared-address Spaces , 2011, International Journal of Parallel Programming.

[21]  Adolfy Hoisie,et al.  A practical approach to performance analysis and modeling of large-scale systems , 2006, SC.

[22]  Emilio Luque,et al.  Extraction of Parallel Application Signatures for Performance Prediction , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[23]  Joel H. Saltz,et al.  An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications , 2009, IEEE Transactions on Parallel and Distributed Systems.

[24]  Thomas Rauber,et al.  Load balancing schemes for extrapolation methods , 1997 .

[25]  E. Hairer,et al.  Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems , 1993 .

[26]  Thomas Rauber,et al.  Load balancing schemes for extrapolation methods , 1997, Concurr. Pract. Exp..

[27]  Susanne Albers,et al.  Energy-efficient algorithms , 2010, Commun. ACM.

[28]  Rami G. Melhem,et al.  Energy aware scheduling for distributed real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[29]  Rami G. Melhem,et al.  Corollaries to Amdahl's Law for Energy , 2008, IEEE Computer Architecture Letters.

[30]  Sartaj Sahni,et al.  Algorithms for Scheduling Independent Tasks , 1976, J. ACM.

[31]  Gudula Rünger,et al.  Optimizing layer‐based scheduling algorithms for parallel tasks with dependencies , 2011, Concurr. Comput. Pract. Exp..

[32]  Karl-Filip Faxén Efficient Work Stealing for Fine Grained Parallelism , 2010, 2010 39th International Conference on Parallel Processing.

[33]  Keqin Li,et al.  Performance Analysis of Power-Aware Task Scheduling Algorithms on Multiprocessor Computers with Dynamic Voltage and Speed , 2008, IEEE Transactions on Parallel and Distributed Systems.

[34]  Ernst Hairer,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[35]  Shuaiwen Song,et al.  Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[36]  Jörg Keller,et al.  Energy-efficient Mapping of Task Collections onto Manycore Processors , 2013, HiPEAC 2013.

[37]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[38]  Albert Y. Zomaya,et al.  Minimizing Energy Consumption for Precedence-Constrained Applications Using Dynamic Voltage Scaling , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[39]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[40]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[41]  Dong Li,et al.  Strategies for Energy-Efficient Resource Management of Hybrid Programming Models , 2013, IEEE Transactions on Parallel and Distributed Systems.

[42]  Manuel E. Acacio,et al.  Heterogeneous Interconnects for Energy-Efficient Message Management in CMPs , 2010, IEEE Transactions on Computers.

[43]  Mahmut T. Kandemir,et al.  An Energy-Oriented Evaluation of Communication Optimizations for Microcensor Networks , 2003, Euro-Par.

[44]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[45]  Thomas Rauber,et al.  Performance modelling for task-parallel programs , 2004 .

[46]  Thomas Rauber,et al.  Scalability and locality of extrapolation methods on large parallel systems , 2011, Concurr. Comput. Pract. Exp..

[47]  Thomas Rauber,et al.  A source code analyzer for performance prediction , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[48]  C. H. Flood,et al.  The Fortress Language Specification , 2007 .

[49]  Yves Robert,et al.  Energy‐aware scheduling of bag‐of‐tasks applications on master–worker platforms , 2011, Concurr. Comput. Pract. Exp..

[50]  Chaitali Chakrabarti,et al.  Energy-efficient dynamic task scheduling algorithms for DVS systems , 2008, TECS.

[51]  Alejandro Duran,et al.  Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.

[52]  Rami G. Melhem,et al.  Energy efficient redundant configurations for real-time parallel reliable servers , 2009, Real-Time Systems.

[53]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[54]  Sanjay Ranka,et al.  Energy- and performance-aware scheduling of tasks on parallel and distributed systems , 2012, JETC.

[55]  Gudula Rünger,et al.  SEParAT: scheduling support environment for parallel application task graphs , 2012, Cluster Computing.

[56]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[57]  Leon Atkins,et al.  Algorithms for power savings , 2014 .

[58]  Stefanos Kaxiras,et al.  Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.