Variation-aware task allocation and scheduling for improving reliability of real-time MPSoCs

Both soft-error reliability (SER) due to transient faults and lifetime reliability (LTR) due to permanent faults are key concerns in real-time MPSoCs. Existing works have investigated related problems, however, most of them only focus on one of the two reliability concerns. A few efforts do consider both types of reliability together, but ignore the impacts of hardware- and application-level variations on reliability, thus are not applicable to state-of-the-art MPSoCs under variations. In this paper, we focus on increasing SER without sacrificing LTR since transient faults occur much more frequently than permanent faults. Specifically, we propose a novel task allocation and scheduling scheme to maximize SER while satisfying a LTR constraint for soft real-time MPSoCs. Considering that SER is the objective while LTR is a constraint in our problem, and LTR is highly related to core temperature profiles, we dedicate to investigating the effects of variations in core soft-error rate, task vulnerability to soft errors, and task execution time on SER. To the best of our knowledge, our work is the first attempt that jointly handles the two reliability issues as well as taking into account the effects of variations on reliability. Experimental results show that our scheme improves the SER by up to 66% as compared to a number of representative existing approaches while meeting the same LTR constraint.

[1]  Marco Torchiano,et al.  An in-vehicle infotainment software architecture based on google android , 2009, 2009 IEEE International Symposium on Industrial Embedded Systems.

[2]  Dakai Zhu,et al.  On Maximizing Reliability of Real-Time Embedded Applications Under Hard Energy Constraint , 2010, IEEE Transactions on Industrial Informatics.

[3]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[4]  Jaume Abella,et al.  A detailed methodology to compute Soft Error Rates in advanced technologies , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Xiaobo Sharon Hu,et al.  An on-line framework for improving reliability of real-time systems on “big-little” type MPSoCs , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[6]  Krishna K. Rangan,et al.  Achieving uniform performance and maximizing throughput in the presence of heterogeneity , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[7]  Xiaobo Sharon Hu,et al.  Enhancing multicore reliability through wear compensation in online assignment and scheduling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Bharadwaj Veeravalli,et al.  Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[10]  Sheldon X.-D. Tan,et al.  Energy and Lifetime Optimizations for Dark Silicon Manycore Microprocessor Considering Both Hard and Soft Errors , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Xiaodong Li,et al.  Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[12]  Dakai Zhu,et al.  On Reliability Management of Energy-Aware Real-Time Systems Through Task Replication , 2017, IEEE Transactions on Parallel and Distributed Systems.

[13]  Tongquan Wei,et al.  Stochastic thermal-aware real-time task scheduling with considerations of soft errors , 2015, J. Syst. Softw..

[14]  Alireza Ejlali,et al.  DRVS: Power-efficient reliability management through Dynamic Redundancy and Voltage Scaling under variations , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[15]  Li Shang,et al.  System-level reliability modeling for MPSoCs , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[16]  Tongquan Wei,et al.  Balancing lifetime and soft-error reliability to improve system availability , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[17]  Shiyan Hu,et al.  Parallel hierarchical cross entropy optimization for on-chip decap budgeting , 2010, Design Automation Conference.

[18]  Xiaobo Sharon Hu,et al.  Improving System-Level Lifetime Reliability of Multicore Soft Real-Time Systems , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.