Fault and Error Tolerance in Neural Networks: A Review

Beyond energy, the growing number of defects in physical substrates is becoming another major constraint that affects the design of computing devices and systems. As the underlying semiconductor technologies are getting less and less reliable, the probability that some components of computing devices fail also increases, preventing designers from realizing the full potential benefits of on-chip exascale integration derived from near atomic scale feature dimensions. As the quest for performance confronts permanent and transient faults, device variation, and thermal issues, major breakthroughs in computing efficiency are expected to benefit from unconventional and new models of computation, such as brain-inspired computing. The challenge is then to find not only high-performance and energy-efficient, but also fault-tolerant computing solutions. Neural computing principles remain elusive, yet as source of a promising fault-tolerant computing paradigm. In the quest to fault tolerance can be translated into scalable and reliable computing systems, hardware design itself and/or to use circuits even with faults has further motivated research on neural networks, which are potentially capable of absorbing some degrees of vulnerability based on their natural properties. This paper presents a survey on fault tolerance in neural networks manly focusing on well-established passive techniques to exploit and improve, by design, such potential but limited intrinsic property in neural models, particularly for feedforward neural networks. First, fundamental concepts and background on fault tolerance are introduced. Then, we review fault types, models, and measures used to evaluate performance and provide a taxonomy of the main techniques to enhance the intrinsic properties of some neural models, based on the principles and mechanisms that they exploit to achieve fault tolerance passively. For completeness, we briefly review some representative works on active fault tolerance in neural networks. We present some key challenges that remain to be overcome and conclude with an outlook for this field.

[1]  Huawei Li,et al.  Retraining-based timing error mitigation for hardware neural networks , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[2]  Wolfgang Maass,et al.  To Spike or Not to Spike: That Is the Question , 2015, Proc. IEEE.

[3]  Scott Hauck,et al.  The Future of Integrated Circuits: A Survey of Nanoelectronics , 2010, Proceedings of the IEEE.

[4]  Janusz Sosnowski,et al.  Transient fault tolerance in digital systems , 1994, IEEE Micro.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Vincenzo Piuri,et al.  Analysis of Fault Tolerance in Artificial Neural Networks , 2001, J. Parallel Distributed Comput..

[7]  Ignacio Rojas,et al.  Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations , 2000, Neurocomputing.

[8]  Melvin A. Breuer,et al.  Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[9]  Shih-Chieh Chang,et al.  A Fault Detection and Tolerance Architecture for Post-Silicon Skew Tuning , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  A Learning Algorithm for Fault Tolerant Feedforward Neural Networks , 1996 .

[11]  Chidchanok Lursinsap,et al.  Recovering faulty self-organizing neural networks: by weight shifting technique , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[12]  Yogesh Singh,et al.  Feedforward sigmoidal networks - equicontinuity and fault-tolerance properties , 2004, IEEE Transactions on Neural Networks.

[13]  Nitin H. Vaidya,et al.  Understanding Fault Tolerance And Reliability , 1997, Computer.

[14]  Andrew Chi-Sing Leung,et al.  On Node-Fault-Injection Training of an RBF Network , 2009, ICONIP.

[15]  Liam McDaid,et al.  On the role of astroglial syncytia in self-repairing spiking neural networks , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Chidchanok Lursinsap,et al.  Fault Immunization Concept for Self-Organizing Mapping Neural Networks , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Janusz Rajski,et al.  Empirical failure analysis and validation of fault models in CMOS VLSI circuits , 1992, IEEE Design & Test of Computers.

[18]  Sadao Maekawa,et al.  The capacity of associative memories with malfunctioning neurons , 1993, IEEE Trans. Neural Networks.

[19]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[20]  Benjamin W. Wah,et al.  Fault tolerant neural networks with hybrid redundancy , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[21]  Wojciech Maly,et al.  Physically realistic fault models for analog CMOS neural networks , 1991 .

[22]  Nicholas Kyriakopoulos,et al.  A comparative analysis of network dependability, fault-tolerance, reliability, security, and survivability , 2009, IEEE Communications Surveys & Tutorials.

[23]  Zhi-Hua Zhou,et al.  Evolving Fault-Tolerant Neural Networks , 2003, Neural Computing & Applications.

[24]  Bernard Widrow,et al.  Sensitivity of feedforward neural networks to weight errors , 1990, IEEE Trans. Neural Networks.

[25]  Yusuf Leblebici,et al.  Review of advances in neural networks: Neural design technology stack , 2016, Neurocomputing.

[26]  Frederic T. Chong,et al.  Recursive TMR: scaling fault tolerance in the nanoscale era , 2005, IEEE Design & Test of Computers.

[27]  Chalapathy Neti,et al.  Maximally fault tolerant neural networks , 1992, IEEE Trans. Neural Networks.

[28]  K. C. Y. Mei,et al.  Bridging and Stuck-At Faults , 1974, IEEE Transactions on Computers.

[29]  Andrew Chi-Sing Leung,et al.  A Regularizer Approach for RBF Networks Under the Concurrent Weight Failure Situation , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[30]  J.A. Abraham,et al.  Fault and error models for VLSI , 1986, Proceedings of the IEEE.

[31]  Dan Simon,et al.  Distributed fault tolerance in optimal interpolative nets , 2001, IEEE Trans. Neural Networks.

[32]  Andreu Català,et al.  Sensitive Analysis of Radial Basis Function Networks for Fault Tolerance Purposes , 1999, IWANN.

[33]  Luca Benini,et al.  Variability Mitigation in Nanometer CMOS Integrated Systems: A Survey of Techniques From Circuits to Software , 2016, Proceedings of the IEEE.

[34]  T. Sejnowski,et al.  The language of the brain. , 2012, Scientific American.

[35]  Amit Jain,et al.  Analysis & survey on fault tolerance in radial basis function networks , 2015, International Conference on Computing, Communication & Automation.

[36]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[37]  Gunhan Dundar,et al.  Fault-tolerant training of neural networks in the presence of MOS transistor mismatches , 2001 .

[38]  Kaushik Roy,et al.  Parameter Variation Tolerance and Error Resiliency: New Design Paradigm for the Nanoscale Era , 2010, Proceedings of the IEEE.

[39]  Xin Yao,et al.  A New Adaptive Merging and Growing Algorithm for Designing Artificial Neural Networks , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Alan F. Murray,et al.  Toward Optimally Distributed Computation , 1998, Neural Computation.

[41]  Mark Zwolinski,et al.  Fault Tolerance in Distributed Neural Computing , 2015, ArXiv.

[42]  C. H. Sequin,et al.  Fault tolerance in artificial neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[43]  Wolfgang Maass,et al.  Noise as a Resource for Computation and Learning in Networks of Spiking Neurons , 2014, Proceedings of the IEEE.

[44]  Zhi-Hua Zhou,et al.  Improving tolerance of neural networks against multi-node open fault , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[45]  Salvatore Cavalieri,et al.  A novel learning algorithm which improves the partial fault tolerance of multilayer neural networks , 1999, Neural Networks.

[46]  Fernando Morgado Dias,et al.  Fault Tolerance Improvement through Architecture Change in Artificial Neural Networks , 2008, ISICA.

[47]  G. Buja,et al.  Dependability and Functional Safety: Applications in Industrial Electronics Systems , 2012, IEEE Industrial Electronics Magazine.

[48]  Ignacio Rojas,et al.  A Quantitative Study of Fault Tolerance, Noise Immunity, and Generalization Ability of MLPs , 2000, Neural Computation.

[49]  Olivier Temam,et al.  A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[50]  Rui J. P. de Figueiredo,et al.  Efficient learning procedures for optimal interpolative nets , 1993, Neural Networks.

[51]  R. J. P. de Figueiredo,et al.  An optimal matching-score net for pattern classification , 1990, IJCNN.

[52]  Ahmed El-Amawy,et al.  On Fault Tolerant Training of Feedforward Neural Networks , 1997, Neural Networks.

[53]  Rüdiger W. Brause,et al.  Fault-Tolerance in Non-linear Neural Networks , 1988, GI Jahrestagung.

[54]  Jeffrey S. Vetter,et al.  A Survey of Techniques for Modeling and Improving Reliability of Computing Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[55]  P. G. Depledge Fault-tolerant computer systems , 1981 .

[56]  Peijiang Yuan,et al.  The superior fault tolerance of artificial neural network training with a fault/noise injection-based genetic algorithm , 2016, Protein & Cell.

[57]  Chidchanok Lursinsap,et al.  Weight shifting techniques for self-recovery neural networks , 1994, IEEE Trans. Neural Networks.

[58]  A. Rubio,et al.  Adaptive Fault-Tolerant Architecture for Unreliable Technologies With Heterogeneous Variability , 2012, IEEE Transactions on Nanotechnology.

[59]  Petru Eles,et al.  Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems With Checkpointing and Replication , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[60]  Robert J. Marks,et al.  Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter , 1995, IEEE Trans. Neural Networks.

[61]  Ulrich Rückert,et al.  Robustness of radial basis functions , 2005, Neurocomputing.

[62]  David de Andrés,et al.  Fault Emulation for Dependability Evaluation of VLSI Systems , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[63]  Andrew Chi-Sing Leung,et al.  Properties and learning algorithms for faulty RBF networks with coexistence of weight and node failures , 2017, Neurocomputing.

[64]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[65]  Paolo Prinetto,et al.  A review of fault models for lsi/vlsi devices , 1983, Softw. Microsystems.

[66]  Chilukuri K. Mohan,et al.  Robustness of feedforward neural networks , 1993, IEEE International Conference on Neural Networks.

[67]  Andrew Chi-Sing Leung,et al.  Objective Function and Learning Algorithm for the General Node Fault Situation , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Masaru Fukushi,et al.  Fault tolerant multi-layer neural networks with GA training , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[69]  Nader Bagherzadeh,et al.  Analytical Fault Tolerance Assessment and Metrics for TSV-Based 3D Network-on-Chip , 2015, IEEE Transactions on Computers.

[70]  G. Bolt,et al.  Fault models for artificial neural networks , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[71]  Robert I. Damper,et al.  Determining and improving the fault tolerance of multilayer perceptrons in a pattern-recognition application , 1993, IEEE Trans. Neural Networks.

[72]  Douglas B. Terry,et al.  Toward a New Approach to IoT Fault Tolerance , 2016, Computer.

[73]  Chita R. Das,et al.  A probabilistic model for the fault tolerance of multilayer perceptrons , 1996, IEEE Trans. Neural Networks.

[74]  Nait Charif Hammadi,et al.  Fault tolerant constructive algorithm for feedforward neural networks , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.

[75]  Shiyuan Yang,et al.  A modified learning algorithm for improving the fault tolerance of BP networks , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[76]  Andrew Chi-Sing Leung,et al.  Training RBF network to tolerate single node fault , 2011, Neurocomputing.

[77]  Daniel L. Palumbo,et al.  Performance and fault-tolerance of neural networks for optimization , 1993, IEEE Trans. Neural Networks.

[78]  Dhananjay S. Phatak,et al.  Investigating the Fault Tolerance of Neural Networks , 2005, Neural Computation.

[79]  Chidchanok Lursinsap,et al.  Probing technique for neural net fault detection , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[80]  Moritoshi Yasunaga,et al.  Fault-tolerant self-organizing map implemented by wafer-scale integration , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[81]  H. Elsimary,et al.  Fault tolerance in neural networks , 1992, [Proceedings 1992] IEEE International Conference on Systems Engineering.

[82]  Anthony S. Wojcik,et al.  A General, Constructive Approach to Fault-Tolerant Design Using Redundancy , 1989, IEEE Trans. Computers.

[83]  Santosh S. Venkatesh,et al.  The Science of Making ERORS: What Error Tolerance Implies for Capacity in Neural Networks , 1992, IEEE Trans. Knowl. Data Eng..

[84]  Yogesh Singh,et al.  Fault tolerance of feedforward artificial neural networks- a framework of study , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[85]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[86]  Kaushik Roy,et al.  Integrated Systems in the More-Than-Moore Era: Designing Low-Cost Energy-Efficient Systems Using Heterogeneous Components , 2010, IEEE Design & Test.

[87]  Prithviraj Banerjee,et al.  Fault tolerant VLSI systems , 1993 .

[88]  Andrew Chi-Sing Leung,et al.  The effect of weight fault on associative networks , 2011, Neural Computing and Applications.

[89]  Cesare Alippi Selecting accurate, robust, and minimal feedforward neural networks , 2002 .

[90]  P. J. Edwards,et al.  Penalty terms for fault tolerance , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[91]  Dhananjay S. Phatak,et al.  Complete and partial fault tolerance of feedforward neural nets , 1995, IEEE Trans. Neural Networks.

[92]  Hideo Ito,et al.  On the Activation Function and Fault Tolerance in Feedforward Neural Networks , 1998 .

[93]  J. Neumann Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[94]  Vincent Gripon,et al.  Fault-Tolerant Associative Memories Based on $c$-Partite Graphs , 2016, IEEE Transactions on Signal Processing.

[95]  John Shalf,et al.  Computing beyond Moore's Law , 2015, Computer.

[96]  Olivier Temam,et al.  Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[97]  Kishan G. Mehrotra,et al.  Training techniques to obtain fault-tolerant neural networks , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[98]  Alexandre Schmid,et al.  Fault-Tolerant Architectures and Approaches , 2011 .

[99]  Rachid Guerraoui,et al.  When Neurons Fail - Technical Report , 2016 .

[100]  Vincenzo Piuri,et al.  High Performance Fault-Tolerant Digital Neural Networks , 1998, IEEE Trans. Computers.

[101]  Charles F Stevens,et al.  Robustness and fault tolerance make brains harder to study , 2011, BMC Biology.

[102]  Hammadi Nait-Charif,et al.  Improving the Performance of Feedforward Neural Networks by Noise Injection into Hidden Neurons , 1998, J. Intell. Robotic Syst..

[103]  Mikko H. Lipasti,et al.  Automatic abstraction and fault tolerance in cortical microachitectures , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[104]  Lloyd W. Massengill,et al.  Basic mechanisms and modeling of single-event upset in digital microelectronics , 2003 .

[105]  Algirdas Avizienis,et al.  Framework for a taxonomy of fault-tolerance attributes in computer systems , 1983, ISCA '83.

[106]  Daniel S. Yeung,et al.  Sensitivity analysis of multilayer perceptron to input and weight perturbations , 2001, IEEE Trans. Neural Networks.

[107]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[108]  Caro Lucas,et al.  Relaxed Fault-Tolerant Hardware Implementation of Neural Networks in the Presence of Multiple Transient Errors , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[109]  Algirdas Avizienis,et al.  A Unified Reliability Model for Fault-Tolerant Computers , 1980, IEEE Transactions on Computers.

[110]  Jean Arlat,et al.  Nanocomputing: Small Devices, Large Dependability Challenges , 2012, IEEE Security & Privacy.

[111]  Yiorgos Makris,et al.  Toward Silicon-Based Cognitive Neuromorphic ICs—A Survey , 2016, IEEE Design & Test.

[112]  Yoichi Koyanagi,et al.  Fault-Tolerant Design of Neural Networks for Solving Optimization Problems , 1996, IEEE Trans. Computers.

[113]  J.A.G. Nijhuis,et al.  Fault tolerance of neural associative memories , 1989 .

[114]  Amparo Alonso-Betanzos,et al.  A measure of fault tolerance for functional networks , 2004, Neurocomputing.

[115]  Fernando Morgado Dias,et al.  FTSET-a software tool for fault tolerance evaluation and improvement , 2009, Neural Computing and Applications.

[116]  Naveen Verma,et al.  Overcoming Computational Errors in Sensing Platforms Through Embedded Machine-Learning Kernels , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[117]  Ignacio Rojas,et al.  Obtaining Fault Tolerant Multilayer Perceptrons Using an Explicit Regularization , 2000, Neural Processing Letters.

[118]  Mikko A. Uusitalo,et al.  Fault tolerant machine learning for nanoscale cognitive radio , 2011, Neurocomputing.

[119]  Romain Brette,et al.  Philosophy of the Spike: Rate-Based vs. Spike-Based Theories of the Brain , 2015, Front. Syst. Neurosci..

[120]  Andreas G. Andreou,et al.  On fault probabilities and yield models for VLSI neural networks , 1997 .

[121]  Regina Frei,et al.  Self-healing and self-repairing technologies , 2013 .