Test Sample Accuracy Scales with Training Sample Density in Neural Networks

Intuitively, one would expect the accuracy of a trained neural network’s prediction on a test sample to correlate with how densely that sample is surrounded by seen training samples in representation space. In this work we provide theory and experiments that support this hypothesis. We propose an error function for piecewise linear neural networks that takes a local region in the network’s input space and outputs smooth empirical training error, which is an average of empirical training errors from other regions weighted by network representation distance. A bound on the expected smooth error for each region scales inversely with training sample density in representation space. Empirically, we verify this bound is a strong predictor of the inaccuracy of the network’s prediction on test samples. For unseen test sets, including those with out-of-distribution samples, ranking test samples by their local region’s error bound and discarding samples with the highest bounds raises prediction accuracy by up to 20% in absolute terms, on image classification datasets.

[1]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[2]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[3]  Yedid Hoshen,et al.  Classification-Based Anomaly Detection for General Data , 2020, ICLR.

[4]  Yoshua Bengio,et al.  Inductive Biases for Deep Learning of Higher-Level Cognition , 2020, ArXiv.

[5]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[6]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[7]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[8]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[9]  Prateek Mittal,et al.  SSD: A Unified Framework for Self-Supervised Outlier Detection , 2021, ICLR.

[10]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[11]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[12]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[13]  David Rolnick,et al.  Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  Yoshua Bengio,et al.  Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization , 2021, ArXiv.

[16]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[17]  Jinwoo Shin,et al.  CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances , 2020, NeurIPS.

[18]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[19]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[20]  H. Akaike A new look at the statistical model identification , 1974 .

[21]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[22]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[23]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[24]  Ang Li,et al.  Hybrid Models for Open Set Recognition , 2020, ECCV.

[25]  Mario Vento,et al.  A method for improving classification reliability of multilayer perceptrons , 1995, IEEE Trans. Neural Networks.

[26]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[27]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).