论文信息 - Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning

Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by pruning. We find that certain examples, which we term pruning identified exemplars (PIEs), and classes are systematically more impacted by the introduction of sparsity. Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models. These hard-to-generalize-to images tend to be mislabelled, of lower image quality, depict multiple objects or require fine-grained classification. These findings shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.

[1] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[2] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[3] P. Goldman-Rakic,et al. Synaptic development of the cerebral cortex: implications for learning, memory, and mental illness. , 1994, Progress in brain research.

[4] Peter L. Bartlett,et al. Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[5] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[6] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[7] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[8] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[9] Christopher D. Manning,et al. Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.

[10] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[11] Mehryar Mohri,et al. Boosting with Abstention , 2016, NIPS.

[12] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13] Claudio Gentile,et al. Online Learning with Abstention , 2017, ICML.

[14] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[15] Charu C. Aggarwal,et al. Efficient Data Representation by Selecting Prototypes with Importance Weights , 2017, 2019 IEEE International Conference on Data Mining (ICDM).

[16] Lubomir M. Hadjiiski,et al. Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis , 2018, Physics in medicine and biology.

[17] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[18] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[19] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20] Dumitru Erhan,et al. A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[21] David E. Rumelhart,et al. Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[22] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23] Max Welling,et al. Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[24] Oluwasanmi Koyejo,et al. Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[25] Kibok Lee,et al. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[26] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[27] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28] Nikko Ström,et al. Sparse connection and pruning in large dynamic artificial neural networks , 1997, EUROSPEECH.

[29] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[30] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[31] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[32] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[33] Joost van de Weijer,et al. Metric Learning for Novelty and Anomaly Detection , 2018, BMVC.

[34] Ralph B. D'Agostino,et al. Goodness-of-Fit-Techniques , 2020 .

[35] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37] Mehryar Mohri,et al. Learning with Rejection , 2016, ALT.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.