Contention-Aware Reliability Efficient Scheduling on Heterogeneous Computing Systems

Energy efficiency and system reliability are the two main measurements in modern high-performance computing. The majority of previous recent studies have focused on realizing parallel task scheduling with low energy consumption or fast execution time. These approaches were developed with the classic scheduling model. However, the contention model is gaining increasing recognition as a more practical tool to create accurate and efficient schedules. This study proposes a contention-aware reliability management with deadline and energy budget constraints (CARMEB) algorithm for parallel task scheduling in heterogeneous computing systems. CARMEB involves three phases, namely, task priority calculation, communication edge allocation, and slack reclaiming. Results are validated by conducting extensive experiments, including randomly generated task graphs and three types of task graphs in real-world applications. This study demonstrates that our algorithm significantly improves system reliability.

[1]  Albert Y. Zomaya,et al.  CA-DAG: Modeling Communication-Aware Applications for Scheduling in Cloud Computing , 2015, Journal of Grid Computing.

[2]  Keqin Li,et al.  Scheduling Precedence Constrained Tasks with Reduced Processor Energy on Multiprocessor Computers , 2012, IEEE Transactions on Computers.

[3]  Keqin Li,et al.  Joint optimization of energy efficiency and system reliability for precedence constrained tasks in heterogeneous systems , 2016 .

[4]  Jeffrey S. Vetter,et al.  A Survey of Techniques for Modeling and Improving Reliability of Computing Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Atakan Dogan,et al.  Matching and Scheduling Algorithms for Minimizing Execution Time and Failure Probability of Applications in Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[6]  Kenli Li,et al.  Bi-objective workflow scheduling of the energy consumption and reliability in heterogeneous computing systems , 2017, Inf. Sci..

[7]  Chi-Yeh Chen,et al.  Task Scheduling for Maximizing Performance and Reliability Considering Fault Recovery in Heterogeneous Distributed Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[8]  Keqin Li,et al.  Energy-Efficient Task Scheduling on Multiple Heterogeneous Computers: Algorithms, Analysis, and Performance Evaluation , 2016, IEEE Transactions on Sustainable Computing.

[9]  Rajesh Gupta,et al.  Energy-efficient deadline scheduling for heterogeneous systems , 2012, J. Parallel Distributed Comput..

[10]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Nitin Auluck,et al.  Contention Aware Energy Efficient Scheduling on Heterogeneous Multiprocessors , 2015, IEEE Transactions on Parallel and Distributed Systems.

[12]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[13]  Kenli Li,et al.  A hierarchical reliability-driven scheduling algorithm in grid systems , 2012, J. Parallel Distributed Comput..

[14]  Yves Robert,et al.  Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems , 2009, Parallel Comput..

[15]  Manpreet Kaur,et al.  Contention-Aware Scheduling with Task Duplication , 2009, JSSPP.

[16]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[17]  Keqin Li,et al.  Energy and time constrained task scheduling on multiprocessor computers with discrete speed levels , 2016, J. Parallel Distributed Comput..

[18]  Laurence T. Yang,et al.  Contention-Aware Energy Management Scheme for NoC-Based Multicore Real-Time Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[19]  Xiaodong Wu,et al.  Synchronization-Aware Energy Management for VFI-Based Multicore Real-Time Systems , 2012, IEEE Transactions on Computers.

[20]  Ann Gordon-Ross,et al.  High-Performance Energy-Efficient Multicore Embedded Computing , 2012, IEEE Transactions on Parallel and Distributed Systems.

[21]  Kenli Li,et al.  Maximizing reliability with energy conservation for parallel task scheduling in a heterogeneous cluster , 2015, Inf. Sci..

[22]  Juan F. Pérez,et al.  Evaluating Replication for Parallel Jobs: An Efficient Approach , 2016, IEEE Transactions on Parallel and Distributed Systems.

[23]  Albert Y. Zomaya,et al.  Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions , 2011, IEEE Transactions on Parallel and Distributed Systems.

[24]  Ümit V. Çatalyürek,et al.  Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling , 2009, IEEE Transactions on Parallel and Distributed Systems.

[25]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[26]  Xiao Qin,et al.  EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters , 2011, IEEE Transactions on Computers.

[27]  Xin Huang,et al.  Novel heuristic speculative execution strategies in heterogeneous distributed environments , 2016, Comput. Electr. Eng..

[28]  Albert Y. Zomaya,et al.  Minimizing Energy Consumption for Precedence-Constrained Applications Using Dynamic Voltage Scaling , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[29]  Kenli Li,et al.  A Hybrid Chemical Reaction Optimization Scheme for Task Scheduling on Heterogeneous Computing Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[30]  Leonel Sousa,et al.  Communication contention in task scheduling , 2005, IEEE Transactions on Parallel and Distributed Systems.

[31]  Dakai Zhu,et al.  Shared recovery for energy efficiency and reliability enhancements in real-time applications with precedence constraints , 2013, TODE.

[32]  Xiaomin Zhu,et al.  3E: Energy-efficient elastic scheduling for independent tasks in heterogeneous computing systems , 2013, J. Syst. Softw..