Unsupervised neural network models of the ventral visual stream

Significance Primates show remarkable ability to recognize objects. This ability is achieved by their ventral visual stream, multiple hierarchically interconnected brain areas. The best quantitative models of these areas are deep neural networks trained with human annotations. However, they receive more annotations than infants, making them implausible models of the ventral stream development. Here, we report that recent progress in unsupervised learning has largely closed this gap. We find the networks learned with recent unsupervised methods achieve prediction accuracy in the ventral stream that equals or exceeds that of today’s best models. These results illustrate a use of unsupervised learning to model a brain system and present a strong candidate for a biologically plausible computational theory of sensory learning. Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap. We find that neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today’s best supervised methods and that the mapping of these neural network models’ hidden layers is neuroanatomically consistent across the ventral stream. Strikingly, we find that these methods produce brain-like representations even when trained solely with real human child developmental data collected from head-mounted cameras, despite the fact that these datasets are noisy and limited. We also find that semisupervised deep contrastive embeddings can leverage small numbers of labeled examples to produce representations with substantially improved error-pattern consistency to human behavior. Taken together, these results illustrate a use of unsupervised learning to provide a quantitative model of a multiarea cortical brain system and present a strong candidate for a biologically plausible computational theory of primate sensory learning.

[1]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[2]  Daniel Yamins,et al.  Unsupervised Learning From Video With Deep Neural Embeddings , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Fang Liu,et al.  Long-Term Two-Photon Imaging in Awake Macaque Monkey , 2017, Neuron.

[4]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[5]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[6]  Eric T. Carlson,et al.  A neural code for three-dimensional object shape in macaque inferotemporal cortex , 2008, Nature Neuroscience.

[7]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[8]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[9]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[11]  Elias B. Issa,et al.  Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs , 2019, NeurIPS.

[12]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Richard N Aslin,et al.  Nature and origins of the lexicon in 6-mo-olds , 2017, Proceedings of the National Academy of Sciences.

[14]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[15]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[16]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[17]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[18]  Linda B. Smith,et al.  An egocentric perspective on active vision and visual object learning in toddlers , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[19]  Riegeskorte CONTROVERSIAL STIMULI: PITTING NEURAL NETWORKS AGAINST EACH OTHER AS MODELS OF HUMAN RECOGNITION , 2019 .

[20]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[21]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[22]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  L. Gogate,et al.  Attention to Maternal Multimodal Naming by 6- to 8-Month-Old Infants and Learning of Word-Object Relations. , 2006, Infancy : the official journal of the International Society on Infant Studies.

[25]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[26]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[27]  Anthony M. Zador,et al.  A critique of pure learning and what artificial neural networks can learn from animal brains , 2019, Nature Communications.

[28]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[29]  Peter Lennie,et al.  Coding of color and form in the geniculostriate visual pathway (invited review). , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[30]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[31]  K. Norman,et al.  Targeted Memory Reactivation during Sleep Elicits Neural Signals Related to Learning Content , 2019, The Journal of Neuroscience.

[32]  Daniel Yamins,et al.  Local Label Propagation for Large-Scale Semi-Supervised Learning , 2019, ArXiv.

[33]  Michael C. Frank,et al.  SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded From the Infant’s Perspective , 2020, Open Mind.

[34]  Victoria J. H. Ritvo,et al.  Nonmonotonic Plasticity: How Memory Retrieval Drives Learning , 2019, Trends in Cognitive Sciences.

[35]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[36]  Daniel L. K. Yamins,et al.  Two Routes to Scalable Credit Assignment without Weight Symmetry , 2020, ICML.

[37]  Michael Robert DeWeese,et al.  A Sparse Coding Model with Synaptically Local Plasticity and Spiking Neurons Can Account for the Diverse Shapes of V1 Simple Cell Receptive Fields , 2011, PLoS Comput. Biol..

[38]  Alexander S. Ecker,et al.  Neural system identification for large populations separating "what" and "where" , 2017, NIPS.

[39]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[40]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[41]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[42]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[43]  Fei Xu Towards a rational constructivist theory of cognitive development. , 2019, Psychological review.

[44]  Surya Ganguli,et al.  Task-Driven Convolutional Recurrent Models of the Visual System , 2018, NeurIPS.

[45]  P. H. Schiller Effect of lesions in visual cortical area V4 on the recognition of transformed objects , 1995, Nature.

[46]  James J. DiCarlo,et al.  Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior , 2018, Nature Neuroscience.

[47]  Krishna V. Shenoy,et al.  Accurate Estimation of Neural Population Dynamics without Spike Sorting , 2019, Neuron.

[48]  Chen Yu,et al.  Actively Learning Object Names Across Ambiguous Situations , 2013, Top. Cogn. Sci..

[49]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  R. Wong,et al.  Retinal waves and visual system development. , 1999, Annual review of neuroscience.

[51]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[52]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[54]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[55]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[56]  Chengxu Zhuang,et al.  Local Aggregation for Unsupervised Learning of Visual Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[58]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[59]  Gabriel Kreiman,et al.  A neural network trained for prediction mimics diverse features of biological neurons and perception , 2020, Nature Machine Intelligence.

[60]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[61]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[62]  James J DiCarlo,et al.  Neural population control via deep image synthesis , 2018, Science.

[63]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[64]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[66]  R. L. Valois,et al.  The orientation and direction selectivity of cells in macaque visual cortex , 1982, Vision Research.

[67]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[68]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  James J DiCarlo,et al.  Large-Scale, High-Resolution Comparison of the Core Visual Object Recognition Behavior of Humans, Monkeys, and State-of-the-Art Deep Artificial Neural Networks , 2018, The Journal of Neuroscience.

[70]  Charles E Connor,et al.  Underlying principles of visual shape selectivity in posterior inferotemporal cortex , 2004, Nature Neuroscience.

[71]  Mike Wu,et al.  On Mutual Information in Contrastive Learning for Visual Representations , 2020, ArXiv.

[72]  D. Swingley,et al.  At 6–9 months, human infants know the meanings of many common nouns , 2012, Proceedings of the National Academy of Sciences.

[73]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[74]  D. Mayer,et al.  Visual acuity development in infants and young children, as assessed by operant preferential looking , 1982, Vision Research.

[75]  Matthew T. Kaufman,et al.  A neural network that finds a naturalistic solution for the production of muscle activity , 2015, Nature Neuroscience.

[76]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  J. Movshon,et al.  Spatial summation in the receptive fields of simple cells in the cat's striate cortex. , 1978, The Journal of physiology.

[78]  A. Leventhal,et al.  Signal timing across the macaque visual system. , 1998, Journal of neurophysiology.

[79]  Linda B. Smith,et al.  A Developmental Approach to Machine Learning? , 2017, Front. Psychol..

[80]  Jeffrey M. Yau,et al.  Curvature processing dynamics in macaque area V4. , 2013, Cerebral cortex.

[81]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[82]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[83]  Michael C. Frank,et al.  Variability and Consistency in Early Language Learning , 2021 .