Maximizing reliability of energy constrained parallel applications on heterogeneous distributed systems

Abstract Energy is one of the primary design constraints in heterogeneous distributed systems ranging from small embedded devices to large-scale data centers, where a parallel application with precedence-constrained tasks is represented by a directed acyclic graph (DAG). Dynamic voltage and frequency scaling (DVFS) has become an important energy control technology by simultaneously scaling down processor's supply voltage and frequency while tasks are running. However, recent studies show that dynamically scaling down the chip's voltage may lead to a sharp rise in transient failures of processors, thereby affecting the reliability of the system. This study solves the problem of maximizing reliability of an energy constrained parallel application on heterogeneous distributed systems based on DVFS. The problem is decomposed into two sub-problems, namely, satisfying energy constraint and maximizing reliability. The first sub-problem is solved by transferring the energy constraint of the application to that of each task, and the second sub-problem is solved by heuristically scheduling each task with maximum reliability value while satisfying its energy constraint. Experiments with real parallel applications show that the proposed MREC algorithm can obtain larger reliability values than the state-of-the-art reliability maximum energy conservation (RMEC) algorithm while satisfying the energy constraints.

[1]  Raphaël Couturier,et al.  Optimizing the energy consumption of message passing applications with iterations executed over grids , 2016, J. Comput. Sci..

[2]  Keqin Li,et al.  Energy and time constrained task scheduling on multiprocessor computers with discrete speed levels , 2016, J. Parallel Distributed Comput..

[3]  XiaoYong Tang,et al.  Energy-Efficient Reliability-Aware Scheduling Algorithm on Heterogeneous Systems , 2016, Sci. Program..

[4]  Keqin Li,et al.  Mixed real-time scheduling of multiple DAGs-based applications on heterogeneous multi-core processors , 2016, Microprocess. Microsystems.

[5]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[6]  Dakai Zhu,et al.  On Maximizing Reliability of Real-Time Embedded Applications Under Hard Energy Constraint , 2010, IEEE Transactions on Industrial Informatics.

[7]  Yves Robert,et al.  Reliability of task graph schedules with transient and fail-stop failures: complexity and algorithms , 2012, J. Sched..

[8]  Keqin Li,et al.  Schedule length minimization of parallel applications with energy consumption constraints using heuristics on heterogeneous distributed systems , 2017, Concurr. Comput. Pract. Exp..

[9]  Minyi Guo,et al.  Scheduling Co-Design for Reliability and Energy in Cyber-Physical Systems , 2013, IEEE Transactions on Emerging Topics in Computing.

[10]  Keqin Li,et al.  Scheduling Precedence Constrained Tasks with Reduced Processor Energy on Multiprocessor Computers , 2012, IEEE Transactions on Computers.

[11]  Xiaodong Liu,et al.  A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment , 2016, Secur. Commun. Networks.

[12]  Keqin Li,et al.  Minimizing Redundancy to Satisfy Reliability Requirement for a Parallel Application on Heterogeneous Service-Oriented Systems , 2020, IEEE Transactions on Services Computing.

[13]  Emmanuel Jeannot,et al.  Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems , 2007, SPAA '07.

[14]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[15]  S. M. Shatz,et al.  Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems , 1989 .

[16]  Atakan Dogan,et al.  Biobjective Scheduling Algorithms for Execution Time?Reliability Trade-off in Heterogeneous Computing Systems , 2005, Comput. J..

[17]  Minjie Zhang,et al.  A belief propagation-based method for task allocation in open and dynamic cloud environments , 2017, Knowl. Based Syst..

[18]  Alain Girault,et al.  A Novel Bicriteria Scheduling Heuristics Providing a Guaranteed Global System Failure Rate , 2009, IEEE Transactions on Dependable and Secure Computing.

[19]  Xiaomin Zhu,et al.  Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds , 2016, IEEE Transactions on Parallel and Distributed Systems.

[20]  Dakai Zhu,et al.  Reliability-Aware Energy Management for Periodic Real-Time Tasks , 2009, IEEE Trans. Computers.

[21]  Keqin Li,et al.  Joint optimization of energy efficiency and system reliability for precedence constrained tasks in heterogeneous systems , 2016 .

[22]  Zhihua Xia,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[24]  谢国琪 Scheduling trade‐off of dynamic multiple parallel workflows on heterogeneous distributed computing systems , 2016 .

[25]  Zhihua Xia,et al.  A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing , 2016, IEEE Transactions on Information Forensics and Security.

[26]  Dakai Zhu,et al.  Shared recovery for energy efficiency and reliability enhancements in real-time applications with precedence constraints , 2013, TODE.

[27]  Keqin Li,et al.  High performance real-time scheduling of multiple mixed-criticality functions in heterogeneous distributed embedded systems , 2016, J. Syst. Archit..

[28]  Keqin Li,et al.  Power and performance management for parallel computations in clouds and data centers , 2016, J. Comput. Syst. Sci..

[29]  Tao Li,et al.  Leveraging Time Prediction and Error Compensation to Enhance the Scalability of Parallel Multi-Core Simulations , 2017, IEEE Transactions on Parallel and Distributed Systems.

[30]  Hamid Arabnejad,et al.  Maximizing the completion rate of concurrent scientific applications under time and budget constraints , 2017, J. Comput. Sci..

[31]  Keqin Li,et al.  Minimizing Energy Consumption of Real-Time Parallel Applications Using Downward and Upward Approaches on Heterogeneous Systems , 2017, IEEE Transactions on Industrial Informatics.

[32]  Keqin Li,et al.  Heterogeneity-driven end-to-end synchronized scheduling for precedence constrained tasks and messages on networked embedded systems , 2015, J. Parallel Distributed Comput..

[33]  Albert Y. Zomaya,et al.  Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions , 2011, IEEE Transactions on Parallel and Distributed Systems.

[34]  Kenli Li,et al.  Maximizing reliability with energy conservation for parallel task scheduling in a heterogeneous cluster , 2015, Inf. Sci..

[35]  Keqin Li,et al.  Minimizing Schedule Length of Energy Consumption Constrained Parallel Applications on Heterogeneous Distributed Systems , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.