Improving accuracy of on-chip diagnosis via incremental learning

On-chip test/diagnosis is proposed to be an effective method to ensure the lifetime reliability of integrated systems. In order to manage the complexity of such an approach, an integrated system is partitioned into multiple modules where each module can be periodically tested, diagnosed and repaired if necessary. The limitation of on-chip memory and computing capability, coupled with the inherent uncertainty in diagnosis, causes the occurrence of misdiagnoses. To address this challenge, a novel incremental-learning algorithm, namely dynamic k-nearest-neighbor (DKNN), is developed to improve the accuracy of on-chip diagnosis. Different from the conventional KNN, DKNN employs online diagnosis data to update the learned classifier so that the classifier can keep evolving as new diagnosis data becomes available. Incorporating online diagnosis data enables tracking of the fault distribution and thus improves diagnostic accuracy. Experiments using various benchmark circuits (e.g., the cache controller from the OpenSPARC T2 processor design) demonstrate that diagnostic accuracy can be more than doubled.

[1]  Camelia Hora,et al.  Diagnosis of Local Spot Defects in Analog Circuits , 2012, IEEE Transactions on Instrumentation and Measurement.

[2]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[3]  R. D. Blanton,et al.  On-chip diagnosis for early-life and wear-out failures , 2012, 2012 IEEE International Test Conference.

[4]  Kwang-Ting Cheng,et al.  Comprehensive online defect diagnosis in on-chip networks , 2012, 2012 IEEE 30th VLSI Test Symposium (VTS).

[5]  Abdel-Karim S. O. Hassan,et al.  Analog Fault Diagnosis Using Conic Optimization and Ellipsoidal Classifiers , 2014, J. Electron. Test..

[6]  Subhasish Mitra,et al.  CASP: Concurrent Autonomous Chip Self-Test Using Stored Test Patterns , 2008, 2008 Design, Automation and Test in Europe.

[7]  Subhasish Mitra,et al.  Overcoming Early-Life Failure and Aging for Robust Systems , 2009, IEEE Design & Test of Computers.

[8]  Dimitar Lazarevski VLSI Fault Diagnosis – Problems and Decisions , 2012 .

[9]  Naresh R. Shanbhag,et al.  Sequential Element Design With Built-In Soft Error Resilience , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  T. M. Mak Infant Mortality--The Lesser Known Reliability Issue , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[11]  Sarita V. Adve,et al.  Architectures for online error detection and recovery in multicore processors , 2011, 2011 Design, Automation & Test in Europe.

[12]  Fang Chen,et al.  Fault Modeling on Complex Plane and Tolerance Handling Methods for Analog Circuits , 2013, IEEE Transactions on Instrumentation and Measurement.

[13]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[14]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[15]  Sorin Cotofana,et al.  A direct measurement scheme of amalgamated aging effects with novel on-chip sensor , 2013, 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC).

[16]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[17]  Elias S. Manolakos,et al.  IP-cores design for the kNN classifier , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[18]  Gerhard Tröster,et al.  Incremental kNN Classifier Exploiting Correct-Error Teacher for Activity Recognition , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[19]  László Kozma,et al.  k Nearest Neighbors algorithm (kNN) , 2008 .

[20]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[21]  Eric Cheng,et al.  Self-repair of uncore components in robust system-on-chips: An OpenSPARC T2 case study , 2013, 2013 IEEE International Test Conference (ITC).

[22]  Onur Mutlu,et al.  Concurrent autonomous self-test for uncore components in system-on-chips , 2010, 2010 28th VLSI Test Symposium (VTS).

[23]  T.M. Mak,et al.  Built-In Soft Error Resilience for Robust System Design , 2007, 2007 IEEE International Conference on Integrated Circuit Design and Technology.

[24]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[25]  Kwang-Ting Cheng,et al.  End-to-end error correction and online diagnosis for on-chip networks , 2011, 2011 IEEE International Test Conference.

[26]  Hyunki Kim,et al.  Low-cost gate-oxide early-life failure detection in robust systems , 2010, 2010 Symposium on VLSI Circuits.