Are Out-of-Distribution Detection Methods Effective on Large-Scale Datasets?

Supervised classification methods often assume the train and test data distributions are the same and that all classes in the test set are present in the training set. However, deployed classifiers often require the ability to recognize inputs from outside the training set as unknowns. This problem has been studied under multiple paradigms including out-of-distribution detection and open set recognition. For convolutional neural networks, there have been two major approaches: 1) inference methods to separate knowns from unknowns and 2) feature space regularization strategies to improve model robustness to outlier inputs. There has been little effort to explore the relationship between the two approaches and directly compare performance on anything other than small-scale datasets that have at most 100 categories. Using ImageNet-1K and Places-434, we identify novel combinations of regularization and specialized inference methods that perform best across multiple outlier detection problems of increasing difficulty level. We found that input perturbation and temperature scaling yield the best performance on large scale datasets regardless of the feature space regularization strategy. Improving the feature space by regularizing against a background class can be helpful if an appropriate background class can be found, but this is impractical for large scale image classification datasets.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[3]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Graham W. Taylor,et al.  Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[9]  Terrance E. Boult,et al.  Reducing Network Agnostophobia , 2018, NeurIPS.

[10]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[11]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[12]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[13]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[14]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[15]  Lei Shu,et al.  Unseen Class Discovery in Open-world Classification , 2018, ArXiv.

[16]  Wen Qin,et al.  VOD: A Novel Outlier Detection Algorithm Based on Voronoi Diagram , 2010, 2010 WASE International Conference on Information Engineering.

[17]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[20]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[21]  Weng-Keen Wong,et al.  Open Set Learning with Counterfactual Images , 2018, ECCV.

[22]  Anderson Rocha,et al.  Meta-Recognition: The Theory and Practice of Recognition Score Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[24]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[25]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[26]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[27]  Christopher Kanan,et al.  REMIND Your Neural Network to Prevent Catastrophic Forgetting , 2020, ECCV.

[28]  S YuPhilip,et al.  Outlier detection for high dimensional data , 2001 .

[29]  Radu Herbei,et al.  Classification with reject option , 2006 .

[30]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[31]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[32]  Vishal M. Patel,et al.  Deep CNN-based Multi-task Learning for Open-Set Recognition , 2019, ArXiv.

[33]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[34]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[36]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[37]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Alexandre Lacoste,et al.  Bayesian Hypernetworks , 2017, ArXiv.

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[42]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[44]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yang Yu,et al.  Open Category Classification by Adversarial Sample Generation , 2017, IJCAI.

[47]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[48]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[49]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[50]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[52]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[53]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[54]  Vishal M. Patel,et al.  Learning Deep Features for One-Class Classification , 2018, IEEE Transactions on Image Processing.

[55]  Xia Zhu,et al.  Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers , 2018, ECCV.

[56]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.