Lossless Compression of Deep Neural Networks

Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition, where large neural networks are often used to obtain good accuracy. Consequently, it is challenging to deploy these networks under limited computational resources, such as in mobile devices. In this work, we introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced, which thus implies a lossless compression. This algorithm, which we denote as LEO (Lossless Expressiveness Optimization), relies on Mixed-Integer Linear Programming (MILP) to identify Rectified Linear Units (ReLUs) with linear behavior over the input domain. By using L1 regularization to induce such behavior, we can benefit from training over a larger architecture than we would later use in the environment where the trained neural network is deployed.

[1]  Song Han,et al.  DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow , 2016, ArXiv.

[2]  Matteo Fischetti,et al.  Deep neural networks and mixed integer linear optimization , 2018, Constraints.

[3]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Maria-Florina Balcan,et al.  Learning to Branch , 2018, ICML.

[6]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[7]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[8]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[9]  Milind Tambe,et al.  MIPaaL: Mixed Integer Program as a Layer , 2019, AAAI.

[10]  Aleksander Madry,et al.  Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability , 2018, ICLR.

[11]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Computation Graphs , 2017, ArXiv.

[12]  Chen Lin,et al.  Synaptic Strength For Convolutional Neural Network , 2018, NeurIPS.

[13]  Narendra Ahuja,et al.  Coreset-Based Neural Network Compression , 2018, ECCV.

[14]  Yifan Sun,et al.  Wide Compression: Tensor Ring Nets , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Daria Terekhov,et al.  Deep Inverse Optimization , 2018, CPAIOR.

[17]  James Bailey,et al.  An Investigation into Prediction + Optimisation for the Knapsack Problem , 2019, CPAIOR.

[18]  Razvan Pascanu,et al.  On the number of response regions of deep feed forward networks with piece-wise linear activations , 2013, 1312.6098.

[19]  Joe Naoum-Sawaya,et al.  Optimization Models for Machine Learning: A Survey , 2019, Eur. J. Oper. Res..

[20]  Louis Wehenkel,et al.  A Machine Learning-Based Approximation of Strong Branching , 2017, INFORMS J. Comput..

[21]  Adam N. Elmachtoub,et al.  Smart "Predict, then Optimize" , 2017, Manag. Sci..

[22]  David Rolnick,et al.  Complexity of Linear Regions in Deep Networks , 2019, ICML.

[23]  Christian Tjandraatmadja,et al.  Strong mixed-integer programming formulations for trained neural networks , 2018, Mathematical Programming.

[24]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[25]  Srikumar Ramalingam,et al.  Empirical Bounds on Linear Regions of Deep Rectifier Networks , 2018, ISAIM.

[26]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[27]  Stephen P. Boyd,et al.  Differentiable Convex Optimization Layers , 2019, NeurIPS.

[28]  Willem Jan van Hoeve,et al.  Embedding Decision Diagrams into Generative Adversarial Networks , 2019, CPAIOR.

[29]  Alessio Lomuscio,et al.  An approach to reachability analysis for feed-forward ReLU neural networks , 2017, ArXiv.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrea Lodi,et al.  On learning and branching: a survey , 2017 .

[32]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[33]  Xin Yu,et al.  Learning Strict Identity Mappings in Deep Residual Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[35]  David Rolnick,et al.  Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.

[36]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[37]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[39]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[40]  Lars Kotthoff,et al.  Algorithm Selection for Combinatorial Search Problems: A Survey , 2012, AI Mag..

[41]  Andrea Lodi,et al.  Learning MILP Resolution Outcomes Before Reaching Time-Limit , 2019, CPAIOR.

[42]  Bo Peng,et al.  Extreme Network Compression via Filter Group Approximation , 2018, ECCV.

[43]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[44]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[45]  Vladimir Kolmogorov,et al.  Minimizing Nonsubmodular Functions with Graph Cuts-A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Andrea Lodi,et al.  Learning a Classification of Mixed-Integer Quadratic Programming Problems , 2017, CPAIOR.

[47]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[50]  Craig Boutilier,et al.  CAQL: Continuous Action Q-Learning , 2020, ICLR.

[51]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[52]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[53]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Martin J. Wainwright,et al.  MAP estimation via agreement on (hyper)trees: Message-passing and linear programming , 2005, ArXiv.

[55]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[56]  Yuri Malitsky,et al.  ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.

[57]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[58]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[59]  Bingbing Ni,et al.  Variational Convolutional Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[61]  Martin J. Wainwright,et al.  Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[62]  Song Han,et al.  DSD: Dense-Sparse-Dense Training for Deep Neural Networks , 2016, ICLR.

[63]  Sebastian Pokutta,et al.  Principled Deep Neural Network Training through Linear Programming , 2018, Discret. Optim..

[64]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[66]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Bistra N. Dilkina,et al.  Combinatorial Attacks on Binarized Neural Networks , 2019, ICLR.

[68]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[69]  Thiago Serra On Defining Design Patterns to Generalize and Leverage Automated Constraint Solving , 2012 .

[70]  Yunhao Tang,et al.  Reinforcement Learning for Integer Programming: Learning to Cut , 2019, ICML.

[71]  Ramin Zabih,et al.  Deep networks with probabilistic gates , 2018, ArXiv.

[72]  Alexandre Lacoste,et al.  Learning Heuristics for the TSP by Policy Gradient , 2018, CPAIOR.

[73]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[74]  Afshin Abdi,et al.  Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee , 2016, NIPS.

[75]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[76]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[77]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[78]  Marco E. Lübbecke,et al.  Learning When to Use a Decomposition , 2017, CPAIOR.

[79]  Leonid Ryzhyk,et al.  Verifying Properties of Binarized Deep Neural Networks , 2017, AAAI.

[80]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[81]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[82]  H. N. Mhaskar,et al.  Function approximation by deep networks , 2019, ArXiv.

[83]  Le Song,et al.  Accelerating Primal Solution Findings for Mixed Integer Programs Based on Solution Prediction , 2019, AAAI.

[84]  Paola Mello,et al.  Model Agnostic Solution of CSPs via Deep Learning: A Preliminary Study , 2018, CPAIOR.

[85]  Christian Tjandraatmadja,et al.  Bounding and Counting Linear Regions of Deep Neural Networks , 2017, ICML.

[86]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[87]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[88]  Raman Arora,et al.  Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[89]  Stefanie Jegelka,et al.  ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.

[90]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[91]  Scott Sanner,et al.  Symbolic Bucket Elimination for Piecewise Continuous Constrained Optimization , 2018, CPAIOR.

[92]  Scott Sanner,et al.  Nonlinear Hybrid Planning with Deep Net Learned Transition Models and Mixed-Integer Linear Programming , 2017, IJCAI.

[93]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[94]  Srikumar Ramalingam,et al.  Equivalent and Approximate Transformations of Deep Neural Networks , 2019, ArXiv.

[95]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[96]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[97]  Kevin Tierney,et al.  Deep Learning Assisted Heuristic Tree Search for the Container Pre-marshalling Problem , 2017, Comput. Oper. Res..

[98]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[99]  Lars Kottho,et al.  Algorithm Selection for Combinatorial Search Problems: A survey , 2012 .

[100]  Sheila A. McIlraith,et al.  Training Binarized Neural Networks Using MIP and CP , 2019, CP.

[101]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[103]  Kang Li,et al.  Towards Efficient U-Nets: A Coupled and Quantized Approach , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[105]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[106]  Le Song,et al.  Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[107]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[108]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.

[109]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[110]  Greg Mori,et al.  CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[111]  David Bergman,et al.  Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning , 2018, AAAI.

[112]  Michele Lombardi,et al.  Boosting Combinatorial Problem Modeling with Machine Learning , 2018, IJCAI.