论文信息 - The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

The Origins and Prevalence of Texture Bias in Convolutional Neural Networks

Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not from differences in their internal workings, but from differences in the data that they see.

Simon Kornblith | Katherine L. Hermann | Ting Chen | Simon Kornblith | Ting Chen

[1] Alexander Kolesnikov,et al. Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Linda B. Smith,et al. The importance of shape in early lexical learning , 1988 .

[3] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[4] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Jascha Sohl-Dickstein,et al. Adversarial Examples that Fool both Computer Vision and Time-Limited Humans , 2018, NeurIPS.

[8] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ha Hong,et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[10] Aran Nayebi,et al. CORnet: Modeling the Neural Mechanisms of Core Object Recognition , 2018, bioRxiv.

[11] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[12] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Nic Ford,et al. Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16] Matthias Bethge,et al. Increasing the robustness of DNNs against image corruptions by playing the Game of Noise , 2020, ArXiv.

[17] M. Bethge,et al. Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[18] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[19] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[20] Antonio Torralba,et al. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[21] Hongjing Lu,et al. Deep convolutional networks do not classify based on global object shape , 2018, PLoS Comput. Biol..

[22] Eero P. Simoncelli,et al. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[23] Matthias Bethge,et al. A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions , 2020, ECCV.

[24] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.

[25] Matthias Bethge,et al. Generalisation in humans and deep neural networks , 2018, NeurIPS.

[26] Jonas Kubilius,et al. Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[27] Skipper Seabold,et al. Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[28] Simran Kaur,et al. Are Perceptually-Aligned Gradients a General Property of Robust Classifiers? , 2019, ArXiv.

[29] Pascal Frossard,et al. Manitest: Are classifiers really invariant? , 2015, BMVC.

[30] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[31] Lina J. Karam,et al. A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[32] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[35] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[36] Taesup Kim,et al. Fast AutoAugment , 2019, NeurIPS.

[37] Ricardo Matsumura de Araújo,et al. On the Performance of GoogLeNet and AlexNet Applied to Sketches , 2016, AAAI.

[38] Kwang In Kim,et al. On Implicit Filter Level Sparsity in Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[40] Ekin D. Cubuk,et al. A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[41] Linda B. Smith,et al. Shape and the first hundred nouns. , 2004, Child development.

[42] James J DiCarlo,et al. Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks , 2018, The Journal of Neuroscience.

[43] Alan L. Yuille,et al. Object Recognition with and without Objects , 2016, IJCAI.

[44] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Valero Laparra,et al. Eigen-Distortions of Hierarchical Representations , 2017, NIPS.

[46] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[47] Aleksander Madry,et al. Computer Vision with a Single (Robust) Classifier , 2019, NeurIPS 2019.

[48] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[49] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.

[50] Daniel L. K. Yamins,et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[51] Quoc V. Le,et al. Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53] Matthias Bethge,et al. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[54] James J DiCarlo,et al. Neural population control via deep image synthesis , 2018, Science.

[55] D. Navon. Forest before trees: The precedence of global features in visual perception , 1977, Cognitive Psychology.

[56] Aaron C. Courville,et al. Adversarially Learned Inference , 2016, ICLR.

[57] Scott E. Fahlman,et al. An empirical study of learning speed in back-propagation networks , 1988 .

[58] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[60] Ekin D. Cubuk,et al. Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation , 2019, ArXiv.

[61] Zhitao Gong,et al. Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[63] Xiaohua Zhai,et al. The Visual Task Adaptation Benchmark , 2019, ArXiv.

[64] Michael James,et al. Online Normalization for Training Neural Networks , 2019, NeurIPS.

[65] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[66] Aditi Raghunathan,et al. Adversarial Training Can Hurt Generalization , 2019, ArXiv.

[67] Carlos R. Ponce,et al. Evolving Images for Visual Neurons Using a Deep Generative Network Reveals Coding Principles and Neuronal Preferences , 2019, Cell.

[68] Eric P. Xing,et al. Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[69] Yoshua Bengio,et al. Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[70] Radha Poovendran,et al. Semantic Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[71] Dan Boneh,et al. Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[72] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[73] Benjamin Recht,et al. When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet , 2019 .

[74] Aleksander Madry,et al. Adversarial Robustness as a Prior for Learned Representations , 2019 .

[75] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Boris Katz,et al. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[77] Chaz Firestone,et al. Humans can decipher adversarial images , 2018, Nature Communications.

[78] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[79] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations , 2018, 1807.01697.

[80] Quoc V. Le,et al. RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.

[81] Radha Poovendran,et al. Assessing Shape Bias Property of Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[82] Balaji Lakshminarayanan,et al. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.

[83] Walter J. Scheirer,et al. PsyPhy: A Psychophysics Driven Evaluation Framework for Visual Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[85] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[86] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.

[87] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[88] Mingyan Liu,et al. Spatially Transformed Adversarial Examples , 2018, ICLR.

[89] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[90] Nikolaus Kriegeskorte,et al. Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[91] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[92] John K. Tsotsos,et al. A Possible Reason for why Data-Driven Beats Theory-Driven Computer Vision , 2019, ArXiv.

[93] Xiaohua Zhai,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019 .

[94] Samuel Ritter,et al. Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[95] Alexei A. Efros,et al. Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[96] Yair Weiss,et al. Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[97] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[98] Jonas Kubilius,et al. Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[99] Linda B Smith,et al. Reproducibility and a unifying explanation: Lessons from the shape bias. , 2019, Infant behavior & development.

[100] Razvan Pascanu,et al. Deep Learners Benefit More from Out-of-Distribution Examples , 2011, AISTATS.

[101] James R. Bergen,et al. Pyramid-based texture analysis/synthesis , 1995, Proceedings., International Conference on Image Processing.