Generalisation in humans and deep neural networks

We compare the robustness of humans and current convolutional deep neural networks (DNNs) on object recognition under twelve different types of image degradations. First, using three well known DNNs (ResNet-152, VGG-19, GoogLeNet) we find the human visual system to be more robust to nearly all of the tested image manipulations, and we observe progressively diverging classification error-patterns between humans and DNNs when the signal gets weaker. Secondly, we show that DNNs trained directly on distorted images consistently surpass human performance on the exact distortion types they were trained on, yet they display extremely poor generalisation abilities when tested on other distortion types. For example, training on salt-and-pepper noise does not imply robustness on uniform white noise and vice versa. Thus, changes in the noise distribution between training and testing constitutes a crucial challenge to deep learning vision systems that can be systematically addressed in a lifelong machine learning approach. Our new dataset consisting of 83K carefully measured human psychophysical trials provide a useful reference for lifelong robustness against image degradations set by the human visual system.

[1]  J Nachmias,et al.  Letter: Grating contrast: discrimination may be better than detection. , 1974, Vision research.

[2]  G. Box Science and Statistics , 1976 .

[3]  M. Potter Short-term conceptual memory for pictures. , 1976, Journal of experimental psychology. Human learning and memory.

[4]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[5]  P. Lennie,et al.  Chromatic mechanisms in lateral geniculate nucleus of macaque. , 1984, The Journal of physiology.

[6]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[7]  R. Douglas,et al.  Opening the grey box , 1991, Trends in Neurosciences.

[8]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[11]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[12]  J. H. van Hateren,et al.  Modelling the Power Spectra of Natural Images: Statistics and Information , 1996, Vision Research.

[13]  J. Movshon,et al.  Linearity and Normalization in Simple Cells of the Macaque Primary Visual Cortex , 1997, The Journal of Neuroscience.

[14]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[15]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[16]  Victor A. F. Lamme,et al.  Feedforward, horizontal, and feedback processing in the visual cortex , 1998, Current Opinion in Neurobiology.

[17]  Felix A. Wichmann,et al.  Some Aspects of Modelling Human Spatial Vision: Contrast Discrimination , 1999 .

[18]  D G Pelli,et al.  Why use noise? , 1999, Journal of the Optical Society of America. A, Optics, image science, and vision.

[19]  A. Stockman,et al.  The spectral sensitivities of the middle- and long-wavelength-sensitive cones derived from measurements in observers of known genotype , 2000, Vision Research.

[20]  G Richard,et al.  Ultra-rapid categorisation of natural scenes does not rely on colour cues: a study in monkeys and humans , 2000, Vision Research.

[21]  G. Henning,et al.  Contrast discrimination with pulse trains in pink noise. , 2002, Journal of the Optical Society of America. A, Optics, image science, and vision.

[22]  Wulfram Gerstner How Can the Brain Be So Fast , 2006 .

[23]  D. Braun,et al.  Phase noise and the classification of natural images , 2006, Vision Research.

[24]  Denis G. Pelli,et al.  ECVP '07 Abstracts , 2007, Perception.

[25]  Olaf Sporns,et al.  The small world of the cerebral cortex , 2007, Neuroinformatics.

[26]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Jan Drewes,et al.  Animal detection in natural scenes: critical features revisited. , 2010, Journal of vision.

[28]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[29]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[32]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[33]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[34]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[35]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[36]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[37]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[38]  Dimitrios Pantazis,et al.  Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks , 2015 .

[39]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[40]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[43]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[44]  Ronald M. Summers,et al.  Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique , 2016 .

[45]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[48]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[49]  S. P. Arun,et al.  Do Computational Models Differ Systematically from Human Object Perception? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[51]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[52]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[53]  Gregory Shakhnarovich,et al.  Examining the Impact of Blur on Recognition by Convolutional Networks , 2016, ArXiv.

[54]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[55]  Timothée Masquelier,et al.  Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition , 2015, Scientific Reports.

[56]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[57]  Leon A. Gatys,et al.  A parametric texture model based on deep convolutional features closely matches texture appearance for humans , 2017, bioRxiv.

[58]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[59]  Renjie Liao,et al.  Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes , 2016, ICLR.

[60]  Heiko H Schütt,et al.  An image-computable psychophysical spatial vision model. , 2017, Journal of vision.

[61]  Lina J. Karam,et al.  Can the Early Human Visual System Compete with Deep Neural Networks? , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[62]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[63]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[64]  Lina J. Karam,et al.  A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[65]  Matthias Bethge,et al.  Methods and measurements to compare men against machines , 2017, HVEI.

[66]  Valero Laparra,et al.  Eigen-Distortions of Hierarchical Representations , 2017, NIPS.

[67]  Ron Dekel,et al.  Human perception in computer vision , 2017, ArXiv.

[68]  Nasour Bagheri,et al.  Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models , 2017, Scientific Reports.

[69]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[70]  J. Koenderink,et al.  Eidolons: Novel stimuli for vision research. , 2017, Journal of vision.

[71]  Ngai-Man Cheung,et al.  On classification of distorted images with deep convolutional neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  Katherine R. Storrs,et al.  Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments , 2017, Front. Psychol..

[73]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[74]  K. Gegenfurtner,et al.  Processing of chromatic information in a deep convolutional neural network. , 2018, Journal of the Optical Society of America. A, Optics, image science, and vision.

[75]  John K. Tsotsos,et al.  Totally Looks Like - How Humans Compare, Compared to Machines , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[76]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[77]  Nikolaus Kriegeskorte,et al.  Deep Neural Networks in Computational Neuroscience , 2019 .