Predicting Deep Neural Network Generalization with Perturbation Response Curves

The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of prediction tasks. However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition [1] suggests that there is a need for more robust and efficient measures of network generalization. In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score, which are inspired by the Gini coefficient and Palma ratio (measures of income inequality), that accurately predict generalization gaps. Using our framework applied to intra and inter-class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. In addition, we show that our framework and the proposed statistics can be used to capture to what extent a trained network is invariant to a given parametric input transformation, such as rotation or translation. Therefore, these generalization gap prediction statistics also provide a useful means for selecting optimal network architectures and hyperparameters that are invariant to a certain perturbation.

[1]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[4]  W. Verstraete,et al.  Initial community evenness favours functionality under selective stress , 2009, Nature.

[5]  P. Graczyk Gini coefficient: a new way to express selectivity of kinase inhibitors against a family of kinases. , 2007, Journal of medicinal chemistry.

[6]  Yukiko Asada,et al.  Assessment of the health of Americans: the average health-related quality of life and its inequality across individuals and groups , 2005, Population health metrics.

[7]  Mohamed Bekkar,et al.  Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .

[8]  Hossein Mobahi,et al.  Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[9]  Timothy A. Gonsalves,et al.  Feature Selection for Text Classification Based on Gini Coefficient of Inequality , 2010, FSDM.

[10]  Hossein Mobahi,et al.  NeurIPS 2020 Competition: Predicting Generalization in Deep Learning , 2020, ArXiv.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[13]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[14]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[15]  D. Pfeffermann,et al.  Small area estimation , 2011 .

[16]  K SumukhAithal,et al.  Robustness to Augmentations as a Generalization metric , 2021, ArXiv.

[17]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[18]  Hossein Mobahi,et al.  Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.

[19]  Andrew Gordon Wilson,et al.  Learning Invariances in Neural Networks , 2020, NeurIPS.

[20]  Amos J. Storkey,et al.  School of Informatics, University of Edinburgh , 2022 .

[21]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[22]  Joan Bruna,et al.  Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , 2021, ArXiv.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[26]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[29]  Peter J. Lambert,et al.  Inequality Decomposition Analysis and the Gini Coefficient Revisited , 1993 .

[30]  M. O. Lorenz,et al.  Methods of Measuring the Concentration of Wealth , 1905, Publications of the American Statistical Association.

[31]  A. Sumner,et al.  Is It All About the Tails? The Palma Measure of Income Inequality , 2013 .

[32]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[33]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[34]  Manik Sharma,et al.  Representation Based Complexity Measures for Predicting Generalization in Deep Learning , 2020, ArXiv.

[35]  Patrick Thiran,et al.  Generalization Comparison of Deep Neural Networks via Output Sensitivity , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[36]  Vinay Uday Prabhu,et al.  Large image datasets: A pyrrhic win for computer vision? , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).