Energy and reliability management in parallel real-time systems

Historically, slack time in real-time systems has been used as temporal redundancy by roll-back recovery schemes to increase system reliability in the presence of faults. However, with advanced technologies, slack time can also be used by energy management schemes to save energy. For reliable real-time systems where higher levels of reliability are as important as lower levels of energy consumption, centralized management of slack time is desired. For frame-based parallel real-time applications, energy management schemes are first explored. Although the simple static power management that evenly allocates static slack over a schedule is optimal for uni-processor systems, it is not optimal for parallel systems due to different levels of parallelism in a schedule. Taking parallelism variations into consideration, a parallel static power management scheme is proposed. When dynamic slack is considered, assuming global scheduling strategies, slack shifting and sharing schemes as well as speculation schemes are proposed for more energy savings. For simultaneous management of power and reliability, checkpointing techniques are first deployed to efficiently use slack time and the optimal numbers of checkpoints needed to minimize energy consumption or to maximize system reliability are explored. Then, an energy efficient optimistic modular redundancy scheme is addressed. Finally, a framework that encompasses energy and reliability management is proposed for obtaining optimal redundant, configurations. While exploring the trade-off between energy and reliability, the effects of voltage scaling on fault rates are considered.

[1]  Rami Melhem,et al.  Toward the placement of power management points in real-time applications , 2003 .

[2]  J. F. Ziegler,et al.  Terrestrial cosmic ray intensities , 1998, IBM J. Res. Dev..

[3]  R.W. Brodersen,et al.  A dynamic voltage scaled microprocessor system , 2000, IEEE Journal of Solid-State Circuits.

[4]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[5]  Kumar Jayantilal Parekn Abhay,et al.  A generalized processor sharing approach to frow control in integrated services networks , 1992 .

[6]  Rami G. Melhem,et al.  Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi-processor real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[7]  Mark Moir,et al.  Pfair scheduling of fixed and migrating periodic tasks on multiple resources , 1999, Proceedings 20th IEEE Real-Time Systems Symposium (Cat. No.99CB37054).

[8]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[9]  Niraj K. Jha,et al.  Power-conscious joint scheduling of periodic task graphs and aperiodic tasks in distributed real-time embedded systems , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[10]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[11]  T. Juhnke,et al.  Calculation of the Soft Error Rate of Submicron CMOS Logic Circuits , 1994, ESSCIRC '94: Twientieth European Solid-State Circuits Conference.

[12]  James H. Anderson,et al.  Early-release fair scheduling , 2000, Proceedings 12th Euromicro Conference on Real-Time Systems. Euromicro RTS 2000.

[13]  Rolf Ernst,et al.  Embedded program timing analysis based on path clustering and architecture classification , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[14]  M. Baze,et al.  Comparison of error rates in combinational and sequential logic , 1997 .

[15]  Krzysztof Kuchcinski,et al.  Low-energy directed architecture selection and task scheduling for system-level design , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[16]  Mani B. Srivastava,et al.  Predictive system shutdown and other architectural techniques for energy efficient programmable computation , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[17]  Rami Melhem,et al.  Power management points in power-aware real-time systems , 2002 .

[18]  Dar-Tzen Peng,et al.  Performance bounds in list scheduling of redundant tasks on multiprocessors , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[19]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[20]  Flavius Gruian,et al.  System-Level Design Methods for Low-Energy Architectures Containing Variable Voltage Processors , 2000, PACS.

[21]  Rami G. Melhem,et al.  Energy-efficient duplex and TMR real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[22]  Sanjoy K. Baruah,et al.  Fast scheduling of periodic tasks on multiple resources , 1995, Proceedings of 9th International Parallel Processing Symposium.

[23]  Rami G. Melhem,et al.  Energy aware scheduling for distributed real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[24]  Rami G. Melhem,et al.  Power-aware scheduling for AND/OR graphs in real-time systems , 2004, IEEE Transactions on Parallel and Distributed Systems.

[25]  Michael Kistler,et al.  The case for power management in web servers , 2002 .

[26]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[27]  James H. Anderson,et al.  Mixed Pfair/ERfair scheduling of asynchronous periodic tasks , 2004, J. Comput. Syst. Sci..

[28]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[29]  Hee Yong Youn,et al.  A performability model for soft real-time systems , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[30]  Flavius Gruian Hard real-time scheduling for low-energy using stochastic data and DVS processors , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[31]  D. Kirovski,et al.  System-level Synthesis Of Low-power Hard Real-time Systems , 1997, Proceedings of the 34th Design Automation Conference.

[32]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[33]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[34]  Thomas D. Burd,et al.  The simulation and evaluation of dynamic voltage scaling algorithms , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[35]  Dhiraj K. Pradhan,et al.  Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture , 1994, IEEE Trans. Computers.

[36]  Aloysius K. Mok,et al.  Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks , 1989, IEEE Trans. Software Eng..

[37]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[38]  Kang G. Shin,et al.  Real-time dynamic voltage scaling for low-power embedded operating systems , 2001, SOSP.

[39]  S. Thompson MOS Scaling: Transistor Challenges for the 21st Century , 1998 .

[40]  Eric Rotenberg,et al.  FAST: Frequency-aware static timing analysis , 2006, TECS.

[41]  D. Chen,et al.  Task scheduling and voltage selection for energy minimization , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[42]  Rudy Lauwereins,et al.  Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs , 2001, IEEE Des. Test Comput..

[43]  Ragunathan Rajkumar,et al.  Practical voltage-scaling for fixed-priority RT-systems , 2003, The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings..

[44]  Algirdas Avizienis,et al.  Fault Tolerance by Design Diversity: Concepts and Experiments , 1984, Computer.

[45]  Rami G. Melhem,et al.  Energy Efficient Configuration for QoS in Reliable Parallel Servers , 2005, EDCC.

[46]  Rami G. Melhem,et al.  Analysis of an energy efficient optimistic TMR scheme , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[47]  Thomas D. Burd,et al.  Voltage scheduling in the IpARM microprocessor system , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[48]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[49]  Carla Schlatter Ellis,et al.  Power aware page allocation , 2000, SIGP.

[50]  James H. Anderson,et al.  Pfair scheduling: beyond periodic task systems , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[51]  Rami Melhem,et al.  Power Aware Computing , 2002, Series in Computer Science.

[52]  B. D. Guenther,et al.  Aided and automatic target recognition based upon sensory inputs from image forming systems , 1997 .

[53]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[54]  Rami Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, ICCAD 2004.

[55]  Luigi V. Mancini,et al.  Scheduling algorithms for fault-tolerance in hard-real-time systems , 1994, Real-Time Systems.

[56]  Dongkun Shin,et al.  Intra-Task Voltage Scheduling for Low-Energy, Hard Real-Time Applications , 2001, IEEE Des. Test Comput..

[57]  Sanjoy K. Baruah,et al.  Proportionate progress: A notion of fairness in resource allocation , 1993, Algorithmica.

[58]  Francesco Quaglia,et al.  Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation , 2003, IEEE Trans. Parallel Distributed Syst..

[59]  A. Sinha,et al.  JouleTrack-a Web based tool for software energy profiling , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[60]  Sanjoy K. Baruah,et al.  Proportionate progress: a notion of fairness in resource allocation , 1993, STOC '93.

[61]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[62]  Rami G. Melhem,et al.  Multiple-resource periodic scheduling problem: how much fairness is necessary? , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[63]  Rami G. Melhem,et al.  Optimal Reward-Based Scheduling for Periodic Real-Time Tasks , 2001, IEEE Trans. Computers.

[64]  C. M. Krishna,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[65]  R. Hokinson,et al.  Historical trend in alpha-particle induced soft error rates of the Alpha/sup TM/ microprocessor , 2001, 2001 IEEE International Reliability Physics Symposium Proceedings. 39th Annual (Cat. No.00CH37167).

[66]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[67]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[68]  Rami G. Melhem,et al.  Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems , 2000, IEEE Trans. Computers.

[69]  Rami G. Melhem,et al.  Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[70]  Niraj K. Jha,et al.  Static and dynamic variable voltage scheduling algorithms for real-time heterogeneous distributed embedded systems , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[71]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[72]  Sang H. Son,et al.  Scheduling Hard Real-Time Tasks with 1-Processor-Fault-Tolerance , 1993 .

[73]  Thomas D. Burd,et al.  Energy efficient CMOS microprocessor design , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[74]  C. Siva Ram Murthy,et al.  An Efficient Dynamic Scheduling Algorithm For Multiprocessor Real-Time Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[75]  R. K. Shyamasundar,et al.  An Optimal Multiprocessor Real-Time Scheduling Algorithm , 1997, J. Parallel Distributed Comput..

[76]  Daniel Moss,et al.  Compiler-assisted dynamic power-aware scheduling for real-time applications , 2000 .

[77]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the multiple node case , 1993, IEEE INFOCOM '93 The Conference on Computer Communications, Proceedings.

[78]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[79]  M. Srivastava,et al.  Predictive strategies for low-power RTOS scheduling , 2000, Proceedings 2000 International Conference on Computer Design.

[80]  Donald W. Gillies,et al.  Scheduling Tasks with AND/OR Precedence Constraints , 1995, SIAM J. Comput..

[81]  Rami G. Melhem,et al.  Dynamic and aggressive scheduling techniques for power-aware real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[82]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[83]  Rami G. Melhem,et al.  Power aware scheduling for AND/OR graphs in multiprocessor real-time systems , 2002, Proceedings International Conference on Parallel Processing.

[84]  Hiroto Yasuura,et al.  Voltage scheduling problem for dynamically variable voltage processors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[85]  Krishnendu Chakrabarty,et al.  Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems , 2003, ICCAD 2003.

[86]  Ying Zhang,et al.  Task feasibility analysis and dynamic voltage scaling in fault-tolerant real-time embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[87]  Krzysztof Kuchcinski,et al.  LEneS: task scheduling for low-energy systems using variable supply voltage processors , 2001, ASP-DAC '01.

[88]  C. Siva Ram Murthy,et al.  A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis , 1998, IEEE Trans. Parallel Distributed Syst..

[89]  Sang Lyul Min,et al.  Worst case timing requirement of real-time tasks with time redundancy , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).

[90]  Mani B. Srivastava,et al.  Adaptive power-fidelity in energy-aware wireless embedded systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[91]  Rami G. Melhem,et al.  Fault tolerant real-time global scheduling on multiprocessors , 1999, Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS'99.

[92]  James H. Anderson,et al.  Guaranteeing Pfair supertasks by reweighting , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[93]  Joseph A. Catania Soft Errors in Electronic Memory – A White Paper , 2022 .

[94]  Carla Schlatter Ellis,et al.  The Synergy Between Power-Aware Memory Systems and Processor Voltage Scaling , 2003, PACS.

[95]  Johan Karlsson,et al.  On latching probability of particle induced transients in combinational networks , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[96]  C. Krishna,et al.  Reliability of checkpointed real-time systems using time redundancy , 1993 .

[97]  Lawrence A. Rowe,et al.  Parallel MPEG-1 Video Encoding , 1999 .

[98]  Anantha P. Chandrakasan,et al.  Data driven signal processing: an approach for energy efficient computing , 1996, Proceedings of 1996 International Symposium on Low Power Electronics and Design.