Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

We introduce a mixed integer program (MIP) for assigning importance scores to each neuron in deep neural network architectures which is guided by the impact of their simultaneous pruning on the main learning task of the network. By carefully devising the objective function of the MIP, we drive the solver to minimize the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the trained neural network. Further, the proposed formulation generalizes the recently considered lottery ticket optimization by identifying multiple "lucky" sub-networks resulting in optimized architecture that not only performs well on a single dataset, but also generalizes across multiple ones upon retraining of network weights. Finally, we present a scalable implementation of our method by decoupling the importance scores across layers using auxiliary networks. We demonstrate the ability of our formulation to prune neural networks with marginal loss in accuracy and generalizability on popular datasets and architectures.

[1]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Yuandong Tian,et al.  One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Srikumar Ramalingam,et al.  Lossless Compression of Deep Neural Networks , 2020, CPAIOR.

[8]  Tapani Raiko,et al.  Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information , 2013, ICONIP.

[9]  Guodong Zhang,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Mianxiong Dong,et al.  Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing , 2018, IEEE Network.

[12]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  Gilad Yehudai,et al.  Proving the Lottery Ticket Hypothesis: Pruning is All You Need , 2020, ICML.

[15]  Noah A. Smith,et al.  Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[16]  Stephen Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[17]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[18]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[19]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[20]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[21]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[22]  Raquel Urtasun,et al.  MLPrune: Multi-Layer Pruning for Automated Neural Network Compression , 2018 .

[23]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[24]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[25]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[26]  Matteo Fischetti,et al.  Deep neural networks and mixed integer linear optimization , 2018, Constraints.

[27]  Tassilo Klein,et al.  Pruning at a Glance: Global Neural Pruning for Model Compression , 2019, ArXiv.

[28]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[29]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[30]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[33]  B. Weber,et al.  CrossTalk proposal: an important astrocyte‐to‐neuron lactate shuttle couples neuronal activity to glucose utilisation in the brain , 2018, The Journal of physiology.

[34]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[35]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[36]  Nicholas D. Lane,et al.  An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices , 2015, IoT-App@SenSys.

[37]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[38]  Ted K. Ralphs,et al.  Integer and Combinatorial Optimization , 2013 .

[39]  Ali Farhadi,et al.  What’s Hidden in a Randomly Weighted Neural Network? , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[41]  Michael Eickenberg,et al.  Decoupled Greedy Learning of CNNs , 2019, ICML.

[42]  Roger C. Tam,et al.  Efficient Training of Convolutional Deep Belief Networks in the Frequency Domain for Application to High-Resolution 2D and 3D Images , 2015, Neural Computation.

[43]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[44]  Po-Sen Huang,et al.  Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation , 2019, EMNLP/IJCNLP.

[45]  Robert M. Gray,et al.  Toeplitz and Circulant Matrices: A Review , 2005, Found. Trends Commun. Inf. Theory.

[46]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[47]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[48]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[49]  Sanja Fidler,et al.  EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis , 2019, ICML.

[50]  Bernhard C. Geiger,et al.  Understanding Individual Neuron Importance Using Information Theory , 2018, ArXiv.

[51]  Russ Tedrake,et al.  Verifying Neural Networks with Mixed Integer Programming , 2017, ArXiv.

[52]  Christian Tjandraatmadja,et al.  Strong mixed-integer programming formulations for trained neural networks , 2018, Mathematical Programming.

[53]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[54]  William Robson Schwartz,et al.  Pruning Deep Neural Networks using Partial Least Squares , 2018, ArXiv.

[55]  Hang Su,et al.  Pruning from Scratch , 2019, AAAI.

[56]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[57]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[58]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[59]  R. Baker Kearfott,et al.  Introduction to Interval Analysis , 2009 .