Interval Adjoint Significance Analysis for Neural Networks

Optimal neural network architecture is a very important factor for computational complexity and memory footprints of neural networks. In this regard, a robust pruning method based on interval adjoints significance analysis is presented in this paper to prune irrelevant and redundant nodes from a neural network. The significance of a node is defined as a product of a node’s interval width and an absolute maximum of first-order derivative of that node’s interval. Based on the significance of nodes, one can decide how much to prune from each layer. We show that the proposed method works effectively on hidden and input layers by experimenting on famous and complex datasets of machine learning. In the proposed method, a node is removed based on its significance and bias is updated for remaining nodes.

[1]  Laurent Hascoët,et al.  The Tapenade automatic differentiation tool: Principles, model, and specification , 2013, TOMS.

[2]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[3]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[4]  Giovanna Castellano,et al.  An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.

[5]  Daniel W. C. Ho,et al.  A new training and pruning algorithm based on node dependence and Jacobian rank deficiency , 2006, Neurocomputing.

[6]  Uwe Naumann,et al.  A Discrete Adjoint Model for OpenFOAM , 2013, ICCS.

[7]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Ramon E. Moore Methods and applications of interval analysis , 1979, SIAM studies in applied mathematics.

[11]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[12]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[13]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[14]  L. Dixon,et al.  Automatic differentiation of algorithms , 2000 .

[15]  T. Kathirvalavakumar,et al.  Pruning algorithms of neural networks — a comparative study , 2013, Central European Journal of Computer Science.

[16]  U. Naumann,et al.  dco/c++: Derivative Code by Overloading in C++ , 2011 .

[17]  Alfred Jean Philippe Lauret,et al.  A node pruning algorithm based on a Fourier amplitude sensitivity test method , 2006, IEEE Transactions on Neural Networks.

[18]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[19]  Hermann Schichl,et al.  Interval Analysis on Directed Acyclic Graphs for Global Optimization , 2005, J. Glob. Optim..

[20]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[21]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[22]  Uwe Naumann,et al.  The Art of Differentiating Computer Programs - An Introduction to Algorithmic Differentiation , 2012, Software, environments, tools.

[23]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[24]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[25]  Andries Petrus Engelbrecht,et al.  A new pruning heuristic based on variance analysis of sensitivity information , 2001, IEEE Trans. Neural Networks.

[26]  Daniel S. Yeung,et al.  Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure , 2006, Neurocomputing.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Uwe Naumann,et al.  A Case Study in Adjoint Sensitivity Analysis of Parameter Calibration , 2016, ICCS.

[29]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[30]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31]  Uwe Naumann,et al.  Hierarchical Algorithmic Differentiation A Case Study , 2012 .

[32]  Raúl Rojas,et al.  The Backpropagation Algorithm , 1996 .