Perceptron Theory for Predicting the Accuracy of Neural Networks

Many neural network models have been successful at classification problems, but their operation is still treated as a black box. Here, we developed a theory for one-layer perceptrons that can predict performance on classification tasks. This theory is a generalization of an existing theory for predicting the performance of Echo State Networks and connectionist models for symbolic reasoning known as Vector Symbolic Architectures. In this paper, we first show that the proposed perceptron theory can predict the performance of Echo State Networks, which could not be described by the previous theory. Second, we apply our perceptron theory to the last layers of shallow randomly connected and deep multi-layer networks. The full theory is based on Gaussian statistics, but it is analytically intractable. We explore numerical methods to predict network performance for problems with a small number of classes. For problems with a large number of classes, we investigate stochastic sampling methods and a tractable approximation to the full theory. The quality of predictions is assessed in three experimental settings, using reservoir computing networks on a memorization task, shallow randomly connected networks on a collection of classification datasets, and deep convolutional networks with the ImageNet dataset. This study offers a simple, bipartite approach to understand deep neural networks: the input is encoded by the last-but-one layers into a high-dimensional representation. This representation is mapped through the weights of the last layer into the postsynaptic sums of the output neurons. Specifically, the proposed perceptron theory uses the mean vector and covariance matrix of the postsynaptic sums to compute classification accuracies for the different classes. The first two moments of the distribution of the postsynaptic sums can predict the overall network performance quite accurately.

[1]  Daniel Keysers,et al.  Predicting Neural Network Accuracy from Weights , 2020, ArXiv.

[2]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[3]  Mantas Lukosevicius,et al.  A Practical Guide to Applying Echo State Networks , 2012, Neural Networks: Tricks of the Trade.

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  W. W. Peterson,et al.  The theory of signal detectability , 1954, Trans. IRE Prof. Group Inf. Theory.

[6]  Herbert Jaeger,et al.  Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .

[7]  D. Owen A table of normal integrals , 1980 .

[8]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jan M. Rabaey,et al.  Classification and Recall With Binary Hyperdimensional Computing: Tradeoffs in Choice of Density and Mapping Characteristics , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Amit Dhurandhar,et al.  Improving Simple Models with Confidence Profiles , 2018, NeurIPS.

[11]  H. Sompolinsky,et al.  Sparseness and Expansion in Sensory Representations , 2014, Neuron.

[12]  Tony A. Plate,et al.  Holographic Reduced Representation: Distributed Representation for Cognitive Structures , 2003 .

[13]  Friedrich T. Sommer,et al.  Variable Binding for Sparse Distributed Representations: Theory and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Yoh-Han Pao,et al.  Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  A. I. Martyshkin,et al.  Search for a substring of characters using the theory of non-deterministic finite automata and vector-character architecture , 2020, Bulletin of Electrical Engineering and Informatics.

[17]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alan Bundy,et al.  Preparing for the future of Artificial Intelligence , 2016, AI & SOCIETY.

[19]  Stephen I. Gallant,et al.  Representing Objects, Relations, and Sequences , 2013, Neural Computation.

[20]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[21]  Junjie Yan,et al.  Peephole: Predicting Network Performance Before Training , 2017, ArXiv.

[22]  Okko Johannes Räsänen,et al.  Sequence Prediction With Sparse Distributed Hyperdimensional Coding Applied to the Analysis of Mobile Phone Use Patterns , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[24]  Elad Hoffer,et al.  Fix your classifier: the marginal value of training the last weight layer , 2018, ICLR.

[25]  Nikolaos Papakonstantinou,et al.  Hyperdimensional Computing in Industrial Systems: The Use-Case of Distributed Fault Isolation in a Power Plant , 2018, IEEE Access.

[26]  David L. Donoho,et al.  Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.

[27]  Y. Ahmet Sekercioglu,et al.  Holographic Graph Neuron: A Bioinspired Architecture for Pattern Processing , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[29]  Aditya Joshi,et al.  Language Geometry Using Random Indexing , 2016, QI.

[30]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jan M. Rabaey,et al.  High-Dimensional Computing as a Nanoscalable Paradigm , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[33]  Raphaël Féraud,et al.  A methodology to explain neural network classification , 2002, Neural Networks.

[34]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Denis Kleyko,et al.  Integer Echo State Networks: Efficient Reservoir Computing for Digital Hardware , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[39]  Evgeny Osipov,et al.  Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Hod Lipson,et al.  Predicting the accuracy of neural networks from final and intermediate layer outputs , 2019 .

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Ramesh Raskar,et al.  Accelerating Neural Architecture Search using Performance Prediction , 2017, ICLR.

[44]  Evgeny Osipov,et al.  Recognizing Permuted Words with Vector Symbolic Architectures: A Cambridge Test for Machines , 2016, BICA.

[45]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[47]  Dmitri A. Rachkovskij,et al.  Representation and Processing of Structures with Binary Sparse Distributed Codes , 2001, IEEE Trans. Knowl. Data Eng..

[48]  Michael Egmont-Petersen,et al.  On the quality of neural net classifiers , 1994, Artif. Intell. Medicine.

[49]  Constantine Bekas,et al.  TAPAS: Train-less Accuracy Predictor for Architecture Search , 2018, AAAI.

[50]  Tony A. Plate,et al.  Holographic reduced representations , 1995, IEEE Trans. Neural Networks.

[51]  Denis Kleyko,et al.  Autoscaling Bloom filter: controlling trade-off between true and false positives , 2017, Neural Computing and Applications.

[52]  Alexander Legalov,et al.  Associative synthesis of finite state automata model of a controlled object with hyperdimensional computing , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[53]  Friedrich T. Sommer,et al.  A Theory of Sequence Indexing and Working Memory in Recurrent Neural Networks , 2018, Neural Computation.

[54]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[55]  Luca Benini,et al.  Efficient Biosignal Processing Using Hyperdimensional Computing: Network Templates for Combined Learning and Classification of ExG Signals , 2019, Proceedings of the IEEE.