A rapid prototyping system for error-resilient multi-processor systems-on-chip

Static and dynamic variations, which have negative impact on the reliability of microelectronic systems, increase with smaller CMOS technology. Thus, further downscaling is only profitable if the costs in terms of area, energy and delay for reliability keep within limits. Therefore, the traditional worst case design methodology will become infeasible. Future architectures have to be error resilient, i.e., the hardware architecture has to tolerate autonomously transient errors. In this paper, we present an FPGA based rapid prototyping system for multi-processor systems-on-chip composed of autonomous hardware units for error-resilient processing and interconnect. This platform allows the fast architectural exploration of various error protection techniques under different failure rates on the microarchitectural level while keeping track of the system behavior. We demonstrate its applicability on a concrete wireless communication system.

[1]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[2]  Marc Tremblay,et al.  High-Performance Fault-Tolerant VLSI Systems Using Micro Rollback , 1990, IEEE Trans. Computers.

[3]  Michael Nicolaidis,et al.  A tool for automatic generation of self-checking data paths , 1995, Proceedings 13th IEEE VLSI Test Symposium.

[4]  Bashir M. Al-Hashimi,et al.  Joint Consideration of Fault-Tolerance, Energy-Efficiency and Performance in On-Chip Networks , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[5]  IEEE Design & Test of Computers , 1996, IEEE Design & Test of Computers.

[6]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[7]  Liesbet Van der Perre,et al.  A unified instruction set programmable architecture for multi-standard advanced forward error correction , 2008, 2008 IEEE Workshop on Signal Processing Systems.

[8]  Hiroaki Inoue,et al.  VAST: Virtualization-Assisted Concurrent Autonomous Self-Test , 2008, 2008 IEEE International Test Conference.

[9]  S. Mitra,et al.  Error Resilient System Architecture ( ERSA ) For Probabilistic Applications , .

[10]  Wolfgang Rosenstiel,et al.  Organic Computing at the System on Chip Level , 2006, 2006 IFIP International Conference on Very Large Scale Integration.

[11]  Jiri Gaisler A portable and fault-tolerant microprocessor based on the SPARC v8 architecture , 2002, Proceedings International Conference on Dependable Systems and Networks.

[12]  Amer Baghdadi,et al.  From Parallelism Levels to a Multi-ASIP Architecture for Turbo Decoding , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Nihar R. Mahapatra,et al.  A highly-efficient technique for reducing soft errors in static CMOS circuits , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[14]  M.A. Elgamel,et al.  Interconnect noise analysis and optimization in deep submicron technology , 2003, IEEE Circuits and Systems Magazine.

[15]  Giovanni De Micheli,et al.  On-chip self-calibrating communication techniques robust to electrical parameter variations , 2004, IEEE Design & Test of Computers.

[16]  Sarita V. Adve,et al.  Guest Editors' Introduction: Reliability-Aware Microarchitecture , 2005, IEEE Micro.

[17]  Régis Leveugle,et al.  A new approach to control flow checking without program modification , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[18]  Wolfgang Hohl,et al.  Hierarchical Checking of Multiprocessors Using Watchdog Processors , 1994, EDCC.

[19]  David J. Lu Watchdog Processors and Structural Integrity Checking , 1982, IEEE Transactions on Computers.

[20]  Luca Benini,et al.  Low power error resilient encoding for on-chip data buses , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[21]  Jürgen Teich,et al.  Concepts for run-time and error-resilient control flow checking of embedded RISC CPUs , 2009, Int. J. Auton. Adapt. Commun. Syst..

[22]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[23]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[24]  Norbert Wehn,et al.  A Case Study in Reliability-Aware Design: A Resilient LDPC Code Decoder , 2008, 2008 Design, Automation and Test in Europe.

[25]  Giovanni De Micheli Designing Robust Systems with Uncertain Information , 2003 .

[26]  Srivaths Ravi,et al.  Hardware-Assisted Run-Time Monitoring for Secure Program Execution on Embedded Processors , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27]  Krishna V. Palem,et al.  Probabilistic arithmetic and energy efficient embedded signal processing , 2006, CASES '06.

[28]  Jürgen Teich,et al.  Concepts for Autonomous Control Flow Checking for Embedded CPUs , 2008, ATC.

[29]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[30]  Michael Mueller,et al.  RAS strategy for IBM S/390 G5 and G6 , 1999, IBM J. Res. Dev..

[31]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.