Reliability and temperature constrained task scheduling for makespan minimization on heterogeneous multi-core platforms

Abstract We study the problem of scheduling tasks onto a heterogeneous multi-core processor platform for makespan minimization, where each cluster on the platform has a probability of failure governed by an exponential law and the processor platform has a thermal constraint specified by a peak temperature threshold. The goal of our work is to design algorithms that optimize makespan under the constraints of reliability and temperature. We first provide a mixed-integer linear programming (MILP) formulation for assigning and scheduling independent tasks with reliability and temperature constraints on the heterogeneous platform to minimize the makespan. However, MILP takes exponential time to finish. We then propose a two-stage heuristic that determines the assignment, replication, operating frequency, and execution order of tasks to minimize the makespan while satisfying the real-time, reliability, and temperature constraints based on the analysis of the effects of task assignment on makespan, reliability, and temperature. We finally carry out extensive simulation experiments to validate our proposed MILP formulation and two-stage heuristic. Simulation results demonstrate that the proposed MILP formulation can achieve the best performance in reducing makespan among all the methods used in the comparison. The results also show that the proposed two-stage heuristic has a close performance as the representative existing approach ESTS and a better performance when compared to the representative existing approach RBSA, in terms of reducing makespan. In addition, the proposed two-stage heuristic has the highest feasibility as compared to RBSA and ESTS.

[1]  Tongquan Wei,et al.  Stochastic thermal-aware real-time task scheduling with considerations of soft errors , 2015, J. Syst. Softw..

[2]  Alireza Ejlali,et al.  A Comparative Study of System-Level Energy Management Methods for Fault-Tolerant Hard Real-Time Systems , 2011, IEEE Transactions on Computers.

[3]  Albert Y. Zomaya,et al.  A scalable parallel algorithm for atmospheric general circulation models on a multi-core cluster , 2017, Future Gener. Comput. Syst..

[4]  Petru Eles,et al.  Low-Energy Standby-Sparing for Hard Real-Time Systems , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Tongquan Wei,et al.  Balancing lifetime and soft-error reliability to improve system availability , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Xiaobo Sharon Hu,et al.  An on-line framework for improving reliability of real-time systems on “big-little” type MPSoCs , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[7]  Rizos Sakellariou,et al.  Stochastic DAG scheduling using a Monte Carlo approach , 2013, J. Parallel Distributed Comput..

[8]  Muhammad Shafique,et al.  ASER: Adaptive soft error resilience for Reliability-Heterogeneous Processors in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Jie Wu,et al.  Distributed Workload Dissemination for Makespan Minimization in Disruption Tolerant Networks , 2016, IEEE Transactions on Mobile Computing.

[10]  Yang Xiang,et al.  A CPS framework based perturbation constrained buffer planning approach in VLSI design , 2017, J. Parallel Distributed Comput..

[11]  Xiaodong Li,et al.  Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[12]  Yunhao Liu,et al.  Sea Depth Measurement with Restricted Floating Sensors , 2007, 28th IEEE International Real-Time Systems Symposium (RTSS 2007).

[13]  Jaume Abella,et al.  A detailed methodology to compute Soft Error Rates in advanced technologies , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Tongquan Wei,et al.  Quasi-static fault-tolerant scheduling schemes for energy-efficient hard real-time systems , 2012, J. Syst. Softw..

[15]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[16]  Susanne Albers,et al.  Online Makespan Minimization with Parallel Schedules , 2016, Algorithmica.

[17]  Emmanuel Jeannot,et al.  Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems , 2007, SPAA '07.

[18]  Lothar Thiele,et al.  On the scheduling of fault-tolerant mixed-criticality systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  Alain Girault,et al.  Tradeoff exploration between reliability, power consumption, and execution time for embedded systems , 2011, International Journal on Software Tools for Technology Transfer.

[20]  Dakai Zhu,et al.  On Reliability Management of Energy-Aware Real-Time Systems Through Task Replication , 2017, IEEE Transactions on Parallel and Distributed Systems.

[21]  Chandrasekharan Rajendran,et al.  Ant-colony algorithms for permutation flowshop scheduling to minimize makespan/total flowtime of jobs , 2004, Eur. J. Oper. Res..

[22]  Albert Y. Zomaya,et al.  A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems , 2017, Future Gener. Comput. Syst..

[23]  Rajkumar Buyya,et al.  Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm , 2011, Future Gener. Comput. Syst..

[24]  Andras Vajda Multi-core and Many-core Processor Architectures , 2011 .

[25]  Rajkumar Buyya,et al.  Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[26]  Francisco Brasileiro,et al.  Grid Computing for Bag of Tasks Applications , 2003 .

[27]  Rajkumar Buyya,et al.  Offer-based scheduling of deadline-constrained Bag-of-Tasks applications for utility computing systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[28]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[29]  Sarma B. K. Vrudhula,et al.  Temperature-Aware DVFS for Hard Real-Time Applications on Multicore Processors , 2012, IEEE Transactions on Computers.

[30]  Yves Robert,et al.  Energy-aware scheduling under reliability and makespan constraints , 2011, 2012 19th International Conference on High Performance Computing.

[31]  Rami Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, ICCAD 2004.

[32]  Alain Girault,et al.  A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints , 2004, International Conference on Dependable Systems and Networks, 2004.

[33]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[34]  Kenli Li,et al.  Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems , 2014, IEEE Transactions on Parallel and Distributed Systems.

[35]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[36]  Kwang Mong Sim,et al.  A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling , 2013, Future Gener. Comput. Syst..

[37]  Dakai Zhu,et al.  Enhanced reliability-aware power management through shared recovery technique , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[38]  Meikang Qiu,et al.  Throughput maximization for periodic real-time systems under the maximal temperature constraint , 2014, ACM Trans. Embed. Comput. Syst..

[39]  Tongquan Wei,et al.  Peak Temperature Minimization via Task Allocation and Splitting for Heterogeneous MPSoC Real-Time Systems , 2016, J. Signal Process. Syst..

[40]  Tulika Mitra,et al.  Temperature aware task sequencing and voltage scaling , 2008, ICCAD 2008.

[41]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[42]  Kevin Skadron,et al.  Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[43]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[44]  Gang Quan,et al.  Feasibility Analysis for Temperature-Constraint Hard Real-Time Periodic Tasks , 2010, IEEE Transactions on Industrial Informatics.

[45]  Tulika Mitra,et al.  Temperature aware task sequencing and voltage scaling , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[46]  Tulika Mitra,et al.  Approximation-aware scheduling on heterogeneous multi-core architectures , 2015, The 20th Asia and South Pacific Design Automation Conference.

[47]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[48]  Jitender S. Deogun,et al.  Thermal-Constrained Energy-Aware Partitioning for Heterogeneous Multi-core Multiprocessor Real-Time Systems , 2012, 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[49]  Junlong Zhou,et al.  Thermal-Aware Task Scheduling for Energy Minimization in Heterogeneous Real-Time MPSoC Systems , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.