Models of the ventral stream that categorize and visualize images

An open question in systems neuroscience is which objective function (or computational “goal”) best describes the computations performed by the ventral stream (VS) of primate visual cortex. Substantial past research has suggested that object categorization could be such a goal. Recent experiments, however, showed that information about object positions, sizes, etc. is encoded with increasing explicitness along this pathway. Because that information is not necessarily needed for object categorization, this motivated us to ask whether primate VS may do more than “just” object recognition. To address that question, we trained deep neural networks, all with the same architecture, with three different objectives: a supervised object categorization objective; an unsupervised autoencoder objective; and a semi-supervised objective that combined autoencoding with categorization. We then compared the image representations learned by these models to those observed in areas V4 and IT of macaque monkeys using canonical correlation analysis (CCA). We found that the semi-supervised model provided the best match the monkey data, followed closely by the unsupervised model, and more distantly by the supervised one. These results suggest that multiple objectives – including, critically, unsupervised ones – might be essential for explaining the computations performed by primate VS.

[1]  Michael Robert DeWeese,et al.  A Sparse Coding Model with Synaptically Local Plasticity and Spiking Neurons Can Account for the Diverse Shapes of V1 Simple Cell Receptive Fields , 2011, PLoS Comput. Biol..

[2]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[3]  P. Goldman-Rakic,et al.  Preface: Cerebral Cortex Has Come of Age , 1991 .

[4]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[5]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[6]  George A. Alvarez,et al.  A self-supervised domain-general learning framework for human ventral stream representation , 2020, Nature Communications.

[7]  Ha Hong,et al.  Simple Learned Weighted Sums of Inferior Temporal Neuronal Firing Rates Accurately Predict Human Core Object Recognition Performance , 2015, The Journal of Neuroscience.

[8]  Leslie G. Ungerleider,et al.  Object representations in the temporal cortex of monkeys and humans as revealed by functional magnetic resonance imaging. , 2009, Journal of neurophysiology.

[9]  J. Duncan,et al.  Top-Down Activation of Shape-Specific Population Codes in Visual Cortex during Mental Imagery , 2009, The Journal of Neuroscience.

[10]  Ha Hong,et al.  Explicit information for category-orthogonal object properties increases along the ventral stream , 2016, Nature Neuroscience.

[11]  N. Kanwisher,et al.  Mental Imagery of Faces and Places Activates Corresponding Stimulus-Specific Brain Regions , 2000, Journal of Cognitive Neuroscience.

[12]  David Pfau,et al.  Dead leaves and the dirty ground: low-level image statistics in transmissive and occlusive imaging environments. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Elijah D. Christensen,et al.  Using deep learning to probe the neural code for images in primary visual cortex , 2019, Journal of vision.

[14]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[15]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[16]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[17]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[18]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[19]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[20]  Hao-Ting Wang,et al.  Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists , 2020, NeuroImage.

[21]  Bevil R. Conway,et al.  The Organization and Operation of Inferior Temporal Cortex. , 2018, Annual review of vision science.

[22]  Martin Rehn,et al.  A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields , 2007, Journal of Computational Neuroscience.

[23]  Tal Golan,et al.  Controversial stimuli: pitting neural networks against each other as models of human recognition , 2019, ArXiv.

[24]  Alona Fyshe,et al.  Improved object recognition using neural networks trained to mimic the brain's statistical properties , 2020, Neural Networks.

[25]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[26]  Leon A. Gatys,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2017, bioRxiv.

[27]  Matthew T. Kaufman,et al.  A neural network that finds a naturalistic solution for the production of muscle activity , 2015, Nature Neuroscience.

[28]  James J DiCarlo,et al.  Eight open questions in the computational modeling of higher sensory cortex , 2016, Current Opinion in Neurobiology.

[29]  Jonas Kubilius,et al.  Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence , 2020, Neuron.