A Stage-Wise Soft-Error Detection Scheme for Flip-Flop Based Pipelines in Secure Cloud Servers

The shrinking feature sizes make transistors increasingly susceptible to soft errors, which can severely degrade the systems’ RAS (Reliability, Availability, and Serviceability). The tough challenge results from not only increasing SER (soft error rate) of storage cells, but also the increasing susceptibility of combinational logics to soft errors. How to efficiently detect soft errors becomes the primary problem in the Backward Error Recovery (BER) schemes that are cost-effective in soft error tolerance. This paper presents a soft error detection scheme, AUDITOR, for flip-flop based pipelines. The AUDITOR copes with both types of soft errors—single event upset (SEU) and single event transient (SET). We propose a “local-audit” fault detection mechanism, by which each pipeline stage is verified independently and the verifying result registers with a dedicated “audit” bit (V-bit). All the V-bits are distributed across the whole pipeline and synergically monitor the pipeline execution. To relax the constraint of SET detection capability imposed by the inherent fully synchronous operation mode in flip-flop based pipelines, we firstly propose using path-compensation technique to address this constraint. Furthermore, a reuse-based design paradigm is employed to reduce the implementation complexity and area overhead. The AUDITOR possesses robust detection capability and short detection latency, at the expense of about 29 % and 50 % increase in area and power consumption, respectively.

[1]  Meikang Qiu,et al.  Three-phase time-aware energy minimization with DVFS and unrolling for Chip Multiprocessors , 2012, J. Syst. Archit..

[2]  K. Avery,et al.  Single event transient pulsewidth measurements using a variable temporal latch technique , 2004, IEEE Transactions on Nuclear Science.

[3]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[4]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[5]  Michael Gschwind,et al.  Integrated analysis of power and performance for pipelined microprocessors , 2004, IEEE Transactions on Computers.

[6]  Keke Gai,et al.  Phase-Change Memory Optimization for Green Cloud with Genetic Algorithm , 2015, IEEE Transactions on Computers.

[7]  M. Nicolaidis,et al.  Design for soft error mitigation , 2005, IEEE Transactions on Device and Materials Reliability.

[8]  Christine Morin,et al.  An Architecture for Tolerating Processor Failures in Shared Memory Multiprocessors , 1996, IEEE Trans. Computers.

[9]  Meikang Qiu,et al.  Enabling Cloud Computing in Emergency Management Systems , 2014, IEEE Cloud Computing.

[10]  Ravishankar K. Iyer,et al.  An experimental study of soft errors in microprocessors , 2005, IEEE Micro.

[11]  Meikang Qiu,et al.  Security enhancement of cloud servers with a redundancy-based fault-tolerant cache structure , 2015, Future Gener. Comput. Syst..

[12]  Alexander Fish,et al.  Logical Effort for CMOS-Based Dual Mode Logic Gates , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Lloyd W. Massengill,et al.  Basic mechanisms and modeling of single-event upset in digital microelectronics , 2003 .