ReMap: Reliability Management of Peak-Power-Aware Real-Time Embedded Systems Through Task Replication

Increasing power densities in future technology nodes is a crucial issue in multicore platforms. As the number of cores increases in them, power budget constraints may prevent powering all cores simultaneously at full performance level. Therefore, chip manufacturers introduce a power budget constraint as Thermal Design Power (TDP) for chips. Meanwhile, multicore platforms are suitable for the implementation of fault-tolerance techniques to achieve high reliability. Task Replication is a well-known technique to tolerate transient faults. However, careless task replication may lead to significant peak power consumption. In this paper, we consider the problem of achieving a given reliability target while keeping the total power consumption under the chip TDP for a set of periodic soft real-time tasks. For this purpose, we propose a method for mapping and scheduling periodic soft real-time tasks in multicore embedded systems. The proposed method consists of three parts: (i) Reliability-Aware Lowest Utilization Mapping, (ii) Maximum-Power-Aware EDF Scheduling, and (iii) Reliability-and-PeakPower-Aware Dynamic-Voltage-Frequency-Scaling. Our experiments show that our proposed method provides up to 38.4% (on average by 25%) peak power reduction compared to state-of-the-art methods.

[1]  Alireza Ejlali,et al.  Meeting Thermal Safe Power in Fault-Tolerant Heterogeneous Embedded Systems , 2020, IEEE Embedded Systems Letters.

[2]  Alireza Ejlali,et al.  Feedback-Based Energy Management in a Standby-Sparing Scheme for Hard Real-Time Systems , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[3]  Heba Khdr,et al.  Peak Power Management for scheduling real-time tasks on heterogeneous many-core systems , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[4]  Kang G. Shin,et al.  Reducing Peak Power Consumption inMulti-Core Systems without ViolatingReal-Time Constraints , 2014, IEEE Transactions on Parallel and Distributed Systems.

[5]  Albert Meixner,et al.  Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[7]  Dakai Zhu,et al.  On Reliability Management of Energy-Aware Real-Time Systems Through Task Replication , 2017, IEEE Transactions on Parallel and Distributed Systems.

[8]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[9]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[10]  Alireza Ejlali,et al.  Peak Power Management to Meet Thermal Design Power in Fault-Tolerant Embedded Systems , 2019, IEEE Transactions on Parallel and Distributed Systems.

[11]  Heba Khdr,et al.  New trends in dark silicon , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Petru Eles,et al.  A standby-sparing technique with low energy-overhead for fault-tolerant hard real-time systems , 2009, CODES+ISSS '09.

[13]  Jaehwan Kim,et al.  Peak power reduction methodology for multi-core systems , 2010, 2010 International SoC Design Conference.

[14]  Seyed Ghassem Miremadi,et al.  PAM: A Packet Manipulation Mechanism for Mitigating Crosstalk Faults in NoCs , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[15]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[16]  Muhammad Shafique,et al.  Energy Efficiency for Clustered Heterogeneous Multicores , 2017, IEEE Transactions on Parallel and Distributed Systems.

[17]  Dakai Zhu,et al.  Energy-aware Standby-Sparing Technique for periodic real-time applications , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[18]  Heba Khdr,et al.  TSP: Thermal Safe Power - Efficient power budgeting for many-core systems in dark silicon , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[19]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Amir Mahdi Hosseini Monazzah,et al.  READY: Reliability- and Deadline-Aware Power-Budgeting for Heterogeneous Multicore Systems , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Alireza Ejlali,et al.  Energy-Budget-Aware Reliability Management in Multi-Core Embedded Systems with Hybrid Energy Source , 2018 .

[22]  John Lach,et al.  Transient fault models and AVF estimation revisited , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[23]  Heba Khdr,et al.  Thermal Safe Power (TSP): Efficient Power Budgeting for Heterogeneous Manycore Systems in Dark Silicon , 2017, IEEE Transactions on Computers.

[24]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[25]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[26]  Muhammad Shafique,et al.  Peak-Power-Aware Primary-Backup Technique for Efficient Fault-Tolerance in Multicore Embedded Systems , 2020, IEEE Access.

[27]  Shaahin Hessabi,et al.  On the Scheduling of Energy-Aware Fault-Tolerant Mixed-Criticality Multicore Systems with Service Guarantee Exploration , 2019, IEEE Transactions on Parallel and Distributed Systems.

[28]  Lothar Thiele,et al.  Thermal-Aware Global Real-Time Scheduling on Multicore Systems , 2009, 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium.

[29]  Bashir M. Al-Hashimi,et al.  Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[30]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[31]  Alireza Ejlali,et al.  A Comparative Study of Joint Power and Reliability Management Techniques in Multicore Embedded Systems , 2020, 2020 CSI/CPSSI International Symposium on Real-Time and Embedded Systems and Technologies (RTEST).

[32]  Alireza Ejlali,et al.  Offline replication and online energy management for hard real-time multicore systems , 2015, 2015 CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST).

[33]  Dakai Zhu,et al.  Energy management of standby-sparing systems for fixed-priority real-time workloads , 2013, 2013 International Green Computing Conference Proceedings.

[34]  Alireza Ejlali,et al.  Peak-Power-Aware Energy Management for Periodic Real-Time Applications , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[36]  Alireza Ejlali,et al.  Simultaneous Management of Peak-Power and Reliability in Heterogeneous Multicore Embedded Systems , 2020, IEEE Transactions on Parallel and Distributed Systems.

[37]  Shaahin Hessabi,et al.  LESS-MICS: A Low Energy Standby-Sparing Scheme for Mixed-Criticality Systems , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.