When Neurons Fail - Technical Report

Neural networks have been traditionally considered robust in the sense that their precision degrades gracefully with the failure of neurons and can be compensated by additional learning phases. Nevertheless, critical applications for which neural networks are now appealing solutions, cannot afford any additional learning at run-time. In this paper, we view a multilayer neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neural activation functions are Lipschitz-continuous. Our bound is on a quantity, we call the Forward Error Propagation, capturing how much error is propagated by a neural network when a given number of components is failing, computing this quantity only requires looking at the topology of the network, while experimentally assessing the robustness of a network requires the costly experiment of looking at all the possible inputs and testing all the possible configurations of the network corresponding to different failure situations, facing a discouraging combinatorial explosion. We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). In the crash case, our bound involves the number of neurons per layer, the Lipschitz constant of the neural activation function, the number of failing neurons, the synaptic weights and the depth of the layer where the failure occurred. In the case of Byzantine failures, our bound involves, in addition, the synaptic transmission capacity. Interestingly, as we show in the paper, our bound can easily be extended to the case where synapses can fail. We present three applications of our results. The first is a quantification of the effect of memory cost reduction on the accuracy of a neural network. The second is a quantification of the amount of information any neuron needs from its preceding layer, enabling thereby a boosting scheme that prevents neurons from waiting for unnecessary signals. Our third application is a quantification of the trade-off between neural networks robustness and learning cost.

[1]  Dharmendra S. Modha,et al.  Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.

[2]  C. Stevens,et al.  An evaluation of causes for unreliability of synaptic transmission. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[4]  Indranil Saha,et al.  journal homepage: www.elsevier.com/locate/neucom , 2022 .

[5]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[6]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[7]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[8]  Vincenzo Piuri,et al.  Analysis of Fault Tolerance in Artificial Neural Networks , 2001, J. Parallel Distributed Comput..

[9]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[10]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[11]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[12]  Girish Chowdhary,et al.  Adaptive Neural Network Flight Control Using both Current and Recorded Data , 2007 .

[13]  Panos Liatsis,et al.  Artificial Neural Networks in Control and Optimization , 1998 .

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  F. Vallet,et al.  Robustness in Multilayer Perceptrons , 1993, Neural Computation.

[16]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[17]  James G. King,et al.  Reconstruction and Simulation of Neocortical Microcircuitry , 2015, Cell.

[18]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[19]  Nir Shavit,et al.  The big data challenges of connectomics , 2014, Nature Neuroscience.

[20]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[21]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[22]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[23]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[24]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[25]  W. McCulloch,et al.  The limiting information capacity of a neuronal link , 1952 .

[26]  Chalapathy Neti,et al.  Maximally fault tolerant neural networks , 1992, IEEE Trans. Neural Networks.

[27]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[28]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[29]  Natalie D. Enright Jerger,et al.  Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.

[30]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[34]  Natalie D. Enright Jerger,et al.  Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.

[35]  Rachid Guerraoui,et al.  When Neurons Fail , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[36]  C. Lee Giles,et al.  What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .

[37]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[38]  Y. Tan,et al.  Fault-tolerant back-propagation model and its generalization ability , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[39]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[40]  S. Piche,et al.  Robustness of feedforward neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[41]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[42]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[43]  Martin T. Hagan,et al.  Neural network design , 1995 .

[44]  José Eugenio Naranjo,et al.  Modeling the Driving Behavior of Electric Vehicles Using Smartphones and Neural Networks , 2014, IEEE Intelligent Transportation Systems Magazine.

[45]  T. R. Damarla,et al.  Fault tolerance of neural networks , 1989, Proceedings. IEEE Energy and Information Technologies in the Southeast'.

[46]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[47]  Gert Cauwenberghs,et al.  Neuromorphic Silicon Neuron Circuits , 2011, Front. Neurosci.

[48]  Lior Wolf,et al.  A Dynamic Convolutional Layer for short rangeweather prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).