Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy

Neural network pruning is a popular technique used to reduce the inference costs of modern, potentially overparameterized, networks. Starting from a pre-trained network, the process is as follows: remove redundant parameters, retrain, and repeat while maintaining the same test accuracy. The result is a model that is a fraction of the size of the original with comparable predictive performance (test accuracy). Here, we reassess and evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well across a wide spectrum of "harder" metrics such as generalization to out-of-distribution data and resilience to noise. Across evaluations on varying architectures and data sets, we find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks. These results call into question the extent of \emph{genuine} overparameterization in deep learning and raise concerns about the practicability of deploying pruned networks, specifically in the context of safety-critical systems, unless they are widely evaluated beyond test accuracy to reliably predict their performance. Our code is available at this https URL

[1]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Erich Elsen,et al.  The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[4]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[5]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[6]  Ryan P. Adams,et al.  Compressibility and Generalization in Large-Scale Deep Learning , 2018, ArXiv.

[7]  Roger B. Grosse,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[8]  Dan Feldman,et al.  Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds , 2018, ICLR.

[9]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[10]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[11]  Daniel L. K. Yamins,et al.  Pruning neural networks without any data by iteratively conserving synaptic flow , 2020, NeurIPS.

[12]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[13]  Dan Alistarh,et al.  WoodFisher: Efficient second-order approximations for model compression , 2020, ArXiv.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[16]  J. Zico Kolter,et al.  Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience , 2019, ICLR.

[17]  Hao Cheng,et al.  Adversarial Robustness vs. Model Compression, or Both? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  James Demmel,et al.  Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.

[19]  Dan Feldman,et al.  SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks , 2019, ArXiv.

[20]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[21]  Siddhartha Jain,et al.  Overinterpretation reveals image classification model pathologies , 2020, NeurIPS.

[22]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[23]  Jonas Mueller,et al.  What made you do this? Understanding black-box decisions with sufficient input subsets , 2018, AISTATS.

[24]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[25]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[26]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[27]  Yi Yang,et al.  Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks , 2018, IJCAI.

[28]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[29]  S. Karaman,et al.  Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space , 2021, CoRL.

[30]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[31]  Jianxin Wu,et al.  AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference , 2018, Pattern Recognit..

[32]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Prateek Mittal,et al.  Towards Compact and Robust Deep Neural Networks , 2019, ArXiv.

[35]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[36]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[37]  Tsuyoshi Murata,et al.  Towards Robust Compressed Convolutional Neural Networks , 2019, 2019 IEEE International Conference on Big Data and Smart Computing (BigComp).

[38]  Ali Farhadi,et al.  Soft Threshold Weight Reparameterization for Learnable Sparsity , 2020, ICML.

[39]  Yiren Zhao,et al.  To compress or not to compress: Understanding the Interactions between Adversarial Attacks and Neural Network Compression , 2018, SysML.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Suman Jana,et al.  HYDRA: Pruning Adversarially Robust Neural Networks , 2020, NeurIPS.

[42]  Boaz Barak,et al.  Deep double descent: where bigger models and more data hurt , 2019, ICLR.

[43]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[44]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[45]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Benjamin Recht,et al.  Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.

[47]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[48]  Luyu Wang,et al.  Adversarial Robustness of Pruned Neural Networks , 2018 .

[49]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[50]  Kais Kudrolli,et al.  Campfire: Compressible, Regularization-Free, Structured Sparse Training for Hardware Accelerators , 2020, ArXiv.

[51]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[52]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[53]  Zhangyang Wang,et al.  Adversarially Trained Model Compression: When Robustness Meets Efficiency , 2019, ArXiv.

[54]  Yann Dauphin,et al.  Selective Brain Damage: Measuring the Disparate Impact of Model Pruning , 2019, ArXiv.

[55]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.