A Low-Latency DMR Architecture with Fast Checkpoint Recovery Scheme

This chapter presents a novel architecture for a fault-tolerant and dual modular redundancy (DMR) system using a checkpoint recovery approach. The architecture features exploitation of SRAM with simultaneous copy and instantaneous compare function. It can perform low-latency data copying between dual cores. Therefore, it can carry out fast backup and rollback. Furthermore, it can reduce the power consumption during data comparison process compared to the Cyclic Redundancy Check (CRC). Evaluation results show that compared with the conventional checkpoint/restart DMR, the proposed architecture reduces the cycle overhead by 97.8% and achieves a 3.28% low-latency execution cycle even if a one-time fault occurs when executing the task. The proposed architecture provides high reliability for systems with a real-time requirement.

[1]  J. Teifel,et al.  Self-Voting Dual-Modular-Redundancy Circuits for Single-Event-Transient Mitigation , 2008, IEEE Transactions on Nuclear Science.

[2]  Jehoshua Bruck,et al.  Performance Optimization of Checkpointing Schemes with Task Duplication , 1997, IEEE Trans. Computers.

[3]  Shunsuke Okumura,et al.  A 7T/14T Dependable SRAM and its Array Structure to Avoid Half Selection , 2009, 2009 22nd International Conference on VLSI Design.

[4]  Earl E. Swartzlander,et al.  Quadruple Time Redundancy Adders , 2003 .

[5]  Babak Falsafi,et al.  Reunion: Complexity-Effective Multicore Redundancy , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[6]  Shunsuke Okumura,et al.  7T SRAM enabling low-energy simultaneous block copy , 2010, IEEE Custom Integrated Circuits Conference 2010.

[7]  Takeshi Kataoka,et al.  A Cost-Effective Dependable Microcontroller Architecture with Instruction-Level Rollback for Soft Error Recovery , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[8]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[9]  Yuki Kagiyama,et al.  Low-power block-level instantaneous comparison 7T SRAM for dual modular redundancy , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[10]  Jinjun Chen,et al.  A minimum proportional time redundancy based checkpoint selection strategy for dynamic verification of fixed-time constraints in grid workflow systems , 2005, 12th Asia-Pacific Software Engineering Conference (APSEC'05).

[11]  Earl E. Swartzlander,et al.  Quadruple time redundancy adders [error correcting adder] , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.