Do ImageNet Classifiers Generalize to ImageNet?

We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.

[1]  Dahua Lin,et al.  PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[3]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[4]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[5]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[9]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[18]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[19]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[20]  Aleksander Madry,et al.  A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[21]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jia Deng,et al.  Large scale visual recognition , 2012 .

[23]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[24]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[28]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[29]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[32]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[33]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[34]  Avrim Blum,et al.  The Ladder: A Reliable Leaderboard for Machine Learning Competitions , 2015, ICML.

[35]  Jitendra Malik,et al.  Technical Perspective: What led computer vision to deep learning? , 2017, Commun. ACM.

[36]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[38]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[39]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[42]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Geometric Robustness of Deep Networks: Analysis and Improvement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[44]  F. Perronnin,et al.  XRCE ’ s participation to ImagEval , 2007 .

[45]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[46]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[47]  Radha Poovendran,et al.  Semantic Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[49]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[50]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[52]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[53]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[54]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[55]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[56]  Pascal Frossard,et al.  Manitest: Are classifiers really invariant? , 2015, BMVC.