A Waterfall Model to Achieve Energy Efficient Tasks Mapping for Large Scale GPU Clusters

High energy consumption has become a critical problem for supercomputer systems. GPU clusters are becoming an increasingly popular architecture for building supercomputers because of its great improvement in performance. In this paper, we first formulate the tasks mapping problem as a mini-mal energy consumption problem with deadline constraint. Its optimizing object is very different from the traditional mapping problem which often aims at minimizing make span or minimizing response time. Then a Waterfall Energy Consumption Model, which abstracts the energy consumption of one GPU cluster system into several levels from high to low, is proposed to achieve an energy efficient tasks mapping for large scale GPU clusters. Based on our Waterfall Model, a new task mapping algorithm is developed which tries to apply different energy saving strategies to keep the system remaining at lower energy levels. Our mapping algorithm adopts the Dynamic Voltage Scaling, Dynamic Resource Scaling and $beta$-migration for GPU sub-task to significantly reduce the energy consumption and achieve a better load balance for GPU clusters. A task generator based on the real task traces is developed and the simulation results show that our mapping algorithm based on the Waterfall Model can reduce nearly 50% energy consumption compared with traditional approaches which can only run at a high energy level. Not only the task deadline can be satisfied, but also the task execution time of our mapping algorithm can be reduced.

[1]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[2]  Soraya Ghiasi,et al.  Scheduling for heterogeneous processors in server systems , 2005, CF '05.

[3]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[4]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Bill Howe,et al.  2009 IEEE International Conference on Cluster Computing and Workshops , 2009, Cluster 2009.

[6]  Tchimou N'Takpé,et al.  Critical path and area based scheduling of parallel task graphs on heterogeneous platforms , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[7]  John K. Antonio,et al.  Software support for heterogeneous computing , 1996, CSUR.

[8]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[9]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[10]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[11]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[12]  Howard Jay Siegel,et al.  Techniques for mapping tasks to machines in heterogeneous computing systems , 2000, J. Syst. Archit..

[13]  Howard Jay Siegel,et al.  Heterogeneous Distributed Computing , 1999 .

[14]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Sudhakar Yalamanchili,et al.  Speculative execution on multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16]  Mitsuhisa Sato,et al.  Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[18]  David Fernández-Baca,et al.  Allocating Modules to Processors in a Distributed System , 1989, IEEE Trans. Software Eng..

[19]  Ann H. Carlson Power Struggle , 1968, Nature.

[20]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[21]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[22]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[23]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[24]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[25]  Max Crochemore,et al.  The Computer Science and Engineering Handbook , 2004 .

[26]  Dennis Fowler Power struggles , 2006, NTWK.

[27]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[28]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[29]  Volodymyr Kindratenko,et al.  QP: A Heterogeneous Multi-Accelerator Cluster , 2011 .

[30]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[31]  Henri Casanova,et al.  A Comparison of Scheduling Approaches for Mixed-Parallel Applications on Heterogeneous Platforms , 2007, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07).

[32]  Enrico Gobbetti,et al.  Encyclopedia of Electrical and Electronics Engineering , 1999 .