Energy-efficient resource management for high-performance computing platforms

In the past decade, high-performance computing (HPC) platforms like clusters and computational grids have been widely used to solve challenging and rigorous engineering tasks in industry and scientific applications. Due to extremely high energy cost, reducing energy consumption has become a major concern in designing economical and environmentally friendly HPC infrastructures for many applications. In this dissertation, we first describe a general architecture for building energy-efficient HPC infrastructures, where energy-efficient techniques can be incorporated in each layer of the proposed architecture. Next, we developed an array of energy-efficient scheduling as well as energy-aware load balancing algorithms for high-performance clusters, computational grids, and large-scale storage systems. The primary goal of this dissertation research is to minimize energy consumption while maintaining reasonably high performance by incorporating energy-aware resource management techniques to HPC platforms. We have conducted extensive simulation experiments using both synthetic and real world applications to quantitatively evaluate both energy efficiency and performance of our proposed energy-efficient scheduling and load balancing strategies. Experimental results show that our approaches can reduce energy dissipation in HPC platforms without significantly degrading system performance.

[1]  Arif Ghafoor,et al.  On the Assignment Problem of Arbitrary Process Systems to Heterogeneous Distributed Computer Systems , 1992, IEEE Trans. Computers.

[2]  Mahmut T. Kandemir,et al.  Software-directed disk power management for scientific applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[3]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[4]  Dharma P. Agrawal,et al.  A task duplication based scheduling algorithm for heterogeneous systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[5]  Xiao Qin,et al.  A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters , 2005, J. Parallel Distributed Comput..

[6]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[7]  M. Potkonjak,et al.  On-line scheduling of hard real-time tasks on variable voltage processor , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[8]  David L. Hamilton,et al.  More power needed. , 1974 .

[9]  Shuichi Ichikawa,et al.  Optimizing the configuration of a heterogeneous cluster with multiprocessing and execution-time estimation , 2005, Parallel Comput..

[10]  Dharma P. Agrawal,et al.  A Task Duplication Based Scalable Scheduling Algorithm for Distributed Memory Systems , 1997, J. Parallel Distributed Comput..

[11]  Fred Douglis,et al.  Adaptive Disk Spin-Down Policies for Mobile Computers , 1995, Comput. Syst..

[12]  Gerhard Weikum,et al.  Data partitioning and load balancing in parallel disk systems , 1998, The VLDB Journal.

[13]  Ivor P. Page,et al.  Fast Algorithms for Distributed Resource Allocation , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  Mahmut T. Kandemir,et al.  DRPM: dynamic speed control for power management in server class disks , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[15]  C. Murray Woodside,et al.  Fast Allocation of Processes in Distributed and Parallel Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Jan Janecek,et al.  A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems , 2005, Parallel Comput..

[17]  Javier Cuenca,et al.  Heuristics for work distribution of a homogeneous parallel dynamic programming scheme on heterogeneous systems , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[18]  Edward A. Lee,et al.  A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[19]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[20]  Mani B. Srivastava,et al.  Predictive system shutdown and other architectural techniques for energy efficient programmable computation , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[21]  Kuldip Singh,et al.  An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[22]  Kemal Efe,et al.  Heuristic Models of Task Assignment Scheduling in Distributed Systems , 1982, Computer.

[23]  Kiyoung Choi,et al.  Power conscious fixed priority scheduling for hard real-time systems , 1999, DAC '99.

[24]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[25]  Dirk Grunwald,et al.  Massive Arrays of Idle Disks For Storage Archives , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[26]  Daniel A. Reed,et al.  NCSA's World Wide Web Server: Design and Performance , 1995, Computer.

[27]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[28]  Edward A. Lee,et al.  Declustering: A New Multiprocessor Scheduling Technique , 1993, IEEE Trans. Parallel Distributed Syst..

[29]  Ian T. Foster,et al.  A security architecture for computational grids , 1998, CCS '98.

[30]  Dong Li,et al.  EERAID: energy efficient redundant and inexpensive disk array , 2004, EW 11.

[31]  Anthony A. Maciejewski,et al.  Mapping subtasks with multiple versions on an ad hoc grid , 2005, Parallel Comput..

[32]  Xiao Qin,et al.  Solving Energy-Latency Dilemma: Task Allocation for Parallel Applications in Heterogeneous Embedded Systems , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[33]  Sujit Dey,et al.  High-Level Power Analysis and Optimization , 1997 .

[34]  Mahmut T. Kandemir,et al.  Energy-aware data prefetching for multi-speed disks , 2006, CF '06.

[35]  Jeffrey S. Chase,et al.  Energy management for server clusters , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[36]  Yuanyuan Zhou,et al.  Reducing Energy Consumption of Disk Storage Using Power-Aware Cache Management , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[37]  Ricardo Bianchini,et al.  Conserving disk energy in network servers , 2003, ICS '03.

[38]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[39]  Dennis Fowler Power struggles , 2006, NTWK.

[40]  Luca Benini,et al.  Dynamic power management - design techniques and CAD tools , 1997 .

[41]  Hironori Kasahara,et al.  Data-localization for Fortran macro-dataflow computation using partial static task assignment , 1996, ICS '96.

[42]  Niraj K. Jha,et al.  Safety and Reliability Driven Task Allocation in Distributed Systems , 1999, IEEE Trans. Parallel Distributed Syst..

[43]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..