Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection

Detecting Out-of-distribution (OOD) inputs have been a critical issue for neural networks in the open world. However, the unstable behavior of OOD detection along the optimization trajectory during training has not been explored clearly. In this paper, we first find the performance of OOD detection suffers from overfitting and instability during training: 1) the performance could decrease when the training error is near zero, and 2) the performance would vary sharply in the final stage of training. Based on our findings, we propose Average of Pruning (AoP), consisting of model averaging and pruning, to mitigate the unstable behaviors. Specifically, model averaging can help achieve a stable performance by smoothing the landscape, and pruning is certified to eliminate the overfitting by eliminating redundant features. Comprehensive experiments on various datasets and architectures are conducted to verify the effectiveness of our method.

[1]  Yixuan Li,et al.  OpenOOD: Benchmarking Generalized Out-of-Distribution Detection , 2022, NeurIPS.

[2]  Yixuan Li,et al.  POEM: Out-of-Distribution Detection with Posterior Sampling , 2022, ICML.

[3]  Matthias Hein,et al.  Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities , 2022, ICML.

[4]  Yixuan Li,et al.  Mitigating Neural Network Overconfidence with Logit Normalization , 2022, ICML.

[5]  Xiaojin Zhu,et al.  Out-of-distribution Detection with Deep Nearest Neighbors , 2022, ICML.

[6]  Zhizhong Li,et al.  ViM: Out-Of-Distribution with Virtual-logit Matching , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Songcan Chen,et al.  Can Adversarial Training Be Manipulated By Non-Robust Features? , 2022, NeurIPS.

[8]  Dawn Song,et al.  Scaling Out-of-Distribution Detection for Real-World Settings , 2022, ICML.

[9]  Yixuan Li,et al.  ReAct: Out-of-distribution Detection With Rectified Activations , 2021, NeurIPS.

[10]  Yixuan Li,et al.  A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges , 2021, Trans. Mach. Learn. Res..

[11]  Yixuan Li,et al.  On the Importance of Gradients for Detecting Distributional Shifts in the Wild , 2021, NeurIPS.

[12]  Zhangyang Wang,et al.  Sparse Training via Boosting Pruning Plasticity with Neuroregeneration , 2021, NeurIPS.

[13]  Stanislav Fort,et al.  Exploring the Limits of Out-of-Distribution Detection , 2021, NeurIPS.

[14]  Rui Huang,et al.  MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Songcan Chen,et al.  Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training , 2021, NeurIPS.

[16]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[17]  Yixuan Li,et al.  Energy-based Out-of-distribution Detection , 2020, NeurIPS.

[18]  Sangheum Hwang,et al.  Confidence-Aware Learning for Deep Neural Networks , 2020, ICML.

[19]  Sergey Levine,et al.  Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? , 2020, ICML.

[20]  Daniel M. Roy,et al.  Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.

[21]  P. S. Castro,et al.  Rigging the Lottery: Making All Tickets Winners , 2019, ICML.

[22]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[23]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[24]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[25]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[26]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[27]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[29]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[30]  Lorenzo Rosasco,et al.  Iterate averaging as regularization for stochastic gradient descent , 2018, COLT.

[31]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[32]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[33]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[34]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[35]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[36]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[37]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[38]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[39]  Prateek Jain,et al.  Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..

[40]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[41]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[44]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[45]  Pingmei Xu,et al.  TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking , 2015, ArXiv.

[46]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[50]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[52]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[53]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[54]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.