Test Sample Accuracy Scales with Training Sample Density in Neural Networks

Generalization error bounds measure the deviation of performance on unseen test data from performance on training data. However, by providing one scalar per model, they are input-agnostic. What if one wants to predict error for a specific test sample? To answer this, we propose the novel paradigm of input-conditioned generalization error bounds. For piecewise linear neural networks, given a weighting function that relates the errors of different input activation regions together, we obtain a bound on each region’s generalization error that scales inversely with the density of training samples. That is, more densely supported regions are more reliable. As the bound is input-conditioned, it is to our knowledge the first generalization error bound applicable to the problems of detecting out-of-distribution and misclassified in-distribution samples for neural networks; we find that it performs competitively in both cases when tested on image classification tasks. When integrating the region-conditioned bound over regions, a model-level bound is obtained that implies models with fewer activation patterns, a higher degree of information loss or abstraction, generalize better.

[1]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[2]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[3]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[4]  Yoshua Bengio,et al.  Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization , 2021, ArXiv.

[5]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[6]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[7]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[8]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[9]  Boaz Barak,et al.  Deep double descent: where bigger models and more data hurt , 2019, ICLR.

[10]  Yedid Hoshen,et al.  Classification-Based Anomaly Detection for General Data , 2020, ICLR.

[11]  David Rolnick,et al.  Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.

[12]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[13]  Yoshua Bengio,et al.  Inductive Biases for Deep Learning of Higher-Level Cognition , 2020, ArXiv.

[14]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[15]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Prateek Mittal,et al.  SSD: A Unified Framework for Self-Supervised Outlier Detection , 2021, ICLR.

[18]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[21]  Ilja Kuzborskij,et al.  PAC-Bayes Analysis Beyond the Usual Bounds , 2020, NeurIPS.

[22]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[23]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[24]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[25]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[26]  Chen Sun,et al.  Discrete-Valued Neural Communication , 2021, NeurIPS.

[27]  Ang Li,et al.  Hybrid Models for Open Set Recognition , 2020, ECCV.

[28]  Mario Vento,et al.  A method for improving classification reliability of multilayer perceptrons , 1995, IEEE Trans. Neural Networks.

[29]  MarchandMario,et al.  Domain-adversarial training of neural networks , 2016 .

[30]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[31]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[33]  H. Akaike A new look at the statistical model identification , 1974 .

[34]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[35]  Ivan Flores,et al.  An Optimum Character Recognition System Using Decision Functions , 1958, IRE Trans. Electron. Comput..

[36]  Jinwoo Shin,et al.  CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances , 2020, NeurIPS.

[37]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[38]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[39]  Jasper Snoek,et al.  Second opinion needed: communicating uncertainty in medical machine learning , 2021, npj Digital Medicine.

[40]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .