An Asymmetric Checkpointing and Rollback Error Recovery Scheme for Embedded Processors

This paper presents a checkpointing scheme for rollback error recovery, called Asymmetric Checkpointing and Rollback Recovery (ACRR) which stores the processor states in an asymmetric manner. In this way, error recovery latency and the number of checkpoints are reduced to increase the probability of timely task completion for soft real-time applications. To evaluate the ACRR, this scheme was studied analytically. The analytical results show that the recovery latency is reduced as non-uniformity of the checkpoint increases. As a case study, the ACRR is implemented and simulated on a behavioral VHDL model of LEON2 processor. The simulation results follow the results obtained in the analytical study.

[1]  Todd M. Austin,et al.  Ultra low-cost defect protection for microprocessor pipelines , 2006, ASPLOS XII.

[2]  Ravishankar K. Iyer,et al.  Processor-Level Selective Replication , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[3]  Ying Zhang,et al.  Fault recovery based on checkpointing for hard real-time embedded systems , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[4]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[5]  Henk Corporaal,et al.  Embedded System Design , 2006 .

[6]  Hong Chen,et al.  Performance optimization for energy-aware adaptive checkpointing in embedded real-time systems , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[7]  Mahdi Fazeli,et al.  A software-based concurrent error detection technique for power PC processor-based embedded systems , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[8]  Josep Torrellas,et al.  SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback , 2006, IEEE Micro.

[9]  Heinrich Theodor Vierhaus,et al.  Online Check and Recovery Techniques for Dependable Embedded Processors , 2001, IEEE Micro.

[10]  Shantanu Dutt,et al.  Off-Chip Control Flow Checking of On-Chip Processor-Cache Instruction Stream , 2006, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[11]  Takeshi Kataoka,et al.  A Cost-Effective Dependable Microcontroller Architecture with Instruction-Level Rollback for Soft Error Recovery , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[12]  Petru Eles,et al.  Scheduling of Fault-Tolerant Embedded Systems with Soft and Hard Timing Constraints , 2008, 2008 Design, Automation and Test in Europe.

[13]  Nitin H. Vaidya,et al.  A Case for Two-Level Recovery Schemes , 1998, IEEE Trans. Computers.

[14]  Marc Tremblay,et al.  High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback , 1990, IEEE Trans. Computers.

[15]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[16]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[17]  Lisa Spainhower,et al.  G4: a fault-tolerant CMOS mainframe , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[18]  Hongxia Wang,et al.  TERPS: the embedded reliable processing system , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[19]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[20]  M. Fazeli,et al.  A Checkpointing Technique for Rollback Error Recovery in Embedded Systems , 2006, 2006 International Conference on Microelectronics.

[21]  M. J. Iacoponi Hardware assisted real-time rollback in the advanced fault-tolerant data processor , 1991, IEEE/AIAA 10th Digital Avionics Systems Conference.

[22]  Dhiraj K. Pradhan,et al.  Processor- and memory-based checkpoint and rollback recovery , 1993, Computer.