Self-Adaptive System for Addressing Permanent Errors in On-Chip Interconnects

We present a self-contained adaptive system for detecting and bypassing permanent errors in on-chip interconnects. The proposed system reroutes data on erroneous links to a set of spare wires without interrupting the data flow. To detect permanent errors at runtime, a novel in-line test (ILT) method using spare wires and a test pattern generator is proposed. In addition, an improved syndrome storing-based detection (SSD) method is presented and compared to the ILT method. Each detection method (ILT and SSD) is integrated individually into the noninterrupting adaptive system, and a case study is performed to compare them with Hamming and Bose-Chaudhuri-Hocquenghem (BCH) code implementations. In the presence of permanent errors, the probability of correct transmission in the proposed systems is improved by up to 140% over the standalone Hamming code. Furthermore, our methods achieve up to 38% area, 64% energy, and 61% latency improvements over the BCH implementation at comparable error performance.

[1]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[2]  Luca Benini,et al.  Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.

[3]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[4]  Pasi Liljeberg,et al.  Analysis of forward error correction methods for nanoscale networks-on-chip , 2007, Nano-Net.

[5]  M. J. Irwin,et al.  Adaptive error protection for energy efficiency , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[6]  Kevin Skadron,et al.  Interconnect Lifetime Prediction for Reliability-Aware Systems , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Partha Pratim Pande,et al.  NoC Interconnect Yield Improvement Using Crosspoint Redundancy , 2006, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[8]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[9]  David M. Bloom,et al.  Probabilities of Clumps in a Binary Sequence (and How to Evaluate Them Without Knowing a Lot) , 1996 .

[10]  Mary Jane Irwin,et al.  Adapative Error Protection for Energy Efficiency , 2003, ICCAD 2003.

[11]  Pasi Liljeberg,et al.  Online Reconfigurable Self-Timed Links for Fault Tolerant NoC , 2007, VLSI Design.

[12]  Luca Benini,et al.  Error control schemes for on-chip communication links: the energy-reliability tradeoff , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Axel Jantsch,et al.  A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[14]  Partha Pratim Pande,et al.  Testing Network-on-Chip Communication Fabrics , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Naresh R. Shanbhag,et al.  Coding for systern-on-chip networks: a unified framework , 2004, Proceedings. 41st Design Automation Conference, 2004..

[16]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[17]  Paul Ampadu,et al.  Adaptive error control for reliable systems-on-chip , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[18]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[19]  Luca Benini,et al.  On-Chip Communication Architectures: System on Chip Interconnect , 2008 .

[20]  Yehea I. Ismail,et al.  Effects of inductance on the propagation delay and repeater insertion in VLSI circuits , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[21]  Rodger E. Ziemer,et al.  Introduction to digital communication , 1992 .

[22]  Shantanu Dutt,et al.  Methodologies for Tolerating Cell and Interconnect Faults in FPGAs , 1998, IEEE Trans. Computers.

[23]  Bashir M. Al-Hashimi,et al.  Joint consideration of fault-tolerance, energy-efficiency and performance in on-chip networks , 2007 .

[24]  Paolo Prinetto,et al.  Reliability in Application Specific Mesh-Based NoC Architectures , 2008, 2008 14th IEEE International On-Line Testing Symposium.

[25]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[26]  Cecilia Metra,et al.  Configurable Error Control Scheme for NoC Signal Integrity , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[27]  Ram Krishnamurthy,et al.  A robust alternate repeater technique for high performance busses in the multi-core era , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[28]  G. Lemieux,et al.  Defect-tolerant FPGA switch block and connection block with fine-grain redundancy for yield enhancement , 2005, International Conference on Field Programmable Logic and Applications, 2005..