A Framework for Combining Concurrent Checking and On-Line Embedded Test for Low-Latency Fault Detection in NoC Routers

The focus of the paper is detection of faults in NoC routers by combining concurrent checkers with embedded on-line test to enable cost-effective trade-offs between area-overhead and test coverage. First, we propose a framework of tools for formally evaluating the quality of the checkers and for optimizing the overhead area with given fault coverage constraints. The stress is in particular on the minimization of the error detection latency, which is a crucial aspect in order to eliminate (or limit) error propagation. Second, the concurrent checkers will be complemented by embedded on-line test packets which are to be applied as a periodic routine during the idle periods in router operation. The framework together with the corresponding methodology has been successfully applied to a realistic case-study of a fault tolerant NoC router design. The case study shows that combining concurrent routers with embedded test allows reducing the area overhead of the checkers from 31--35% down to 1.5--10% without sacrificing the fault coverage.

[1]  Kewal K. Saluja,et al.  An implementation and analysis of a concurrent built-in self-test technique , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[2]  Chrysostomos Nicopoulos,et al.  NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[3]  Armin Alaghi,et al.  Online NoC Switch Fault Detection and Diagnosis Using a High Level Fault Model , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).

[4]  J. Raik,et al.  Low-area boundary BIST architecture for mesh-like network-on-chip , 2012, 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[5]  Jay M. Berger A Note on Error Detection Codes for Asymmetric Channels , 1961, Inf. Control..

[6]  Raimund Ubar,et al.  Parallel X-fault simulation with critical path tracing technique , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[7]  Zeljko Zilic,et al.  Assertion Checkers in Verification, Silicon Debug and In-Field Diagnosis , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[8]  S. Katkoori,et al.  Selective triple Modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs , 2004, IEEE Transactions on Nuclear Science.

[9]  R. Ubar,et al.  Structurally synthesized binary decision diagrams , 2004 .

[10]  José Duato,et al.  Logic-Based Distributed Routing for NoCs , 2008, IEEE Computer Architecture Letters.

[11]  Paul Ampadu,et al.  Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[12]  Hans-Joachim Wunderlich,et al.  Area-efficient synthesis of fault-secure NoC switches , 2014, 2014 IEEE 20th International On-Line Testing Symposium (IOLTS).

[13]  Nur A. Touba,et al.  Synthesis of low power CED circuits based on parity codes , 2005, 23rd IEEE VLSI Test Symposium (VTS'05).

[14]  Yiorgos Makris,et al.  Concurrent fault detection in random combinational logic , 2003, Fourth International Symposium on Quality Electronic Design, 2003. Proceedings..

[15]  Valeria Bertacco,et al.  Formally enhanced runtime verification to ensure NoC functional correctness , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Nur A. Touba,et al.  Synthesis of low-cost parity-based partially self-checking circuits , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[17]  R. Iris Bahar,et al.  Enhancing online error detection through area-efficient multi-site implications , 2011, 29th VLSI Test Symposium.

[18]  Artur Jutman,et al.  Turbo Tester – diagnostic package for research and training , 2003 .

[19]  R. Ubar,et al.  An External Test Approach for Network-on-a-Chip Switches , 2006, 2006 15th Asian Test Symposium.

[20]  Raimund Ubar,et al.  Design-for-testability-based external test and diagnosis of mesh-like network-on-a-chips , 2009, IET Comput. Digit. Tech..