Deep convolutional models improve predictions of macaque V1 responses to natural images

Despite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have been successfully applied to neural data: On the one hand, transfer learning from networks trained on object recognition worked remarkably well for predicting neural responses in higher areas of the primate ventral stream, but has not yet been used to model spiking activity in early stages such as V1. On the other hand, data-driven models have been used to predict neural responses in the early visual system (retina and V1) of mice, but not primates. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. Even though V1 is rather at an early to intermediate stage of the visual system, we found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals. Author summary Predicting the responses of sensory neurons to arbitrary natural stimuli is of major importance for understanding their function. Arguably the most studied cortical area is primary visual cortex (V1), where many models have been developed to explain its function. However, the most successful models built on neurophysiologists’ intuitions still fail to account for spiking responses to natural images. Here, we model spiking activity in primary visual cortex (V1) of monkeys using deep convolutional neural networks (CNNs), which have been successful in computer vision. We both trained CNNs directly to fit the data, and used CNNs trained to solve a high-level task (object categorization). With these approaches, we are able to outperform previous models and improve the state of the art in predicting the responses of early visual neurons to natural images. Our results have two important implications. First, since V1 is the result of several nonlinear stages, it should be modeled as such. Second, functional models of entire visual pathways, of which V1 is an early stage, do not only account for higher areas of such pathways, but also provide useful representations for V1 predictions.

[1]  Ryan J. Prenger,et al.  Nonlinear V1 responses to natural scenes revealed by neural network analysis , 2004, Neural Networks.

[2]  James A. Bednar,et al.  Model Constrained by Visual Hierarchy Improves Prediction of Neural Responses to Natural Scenes , 2016, PLoS Comput. Biol..

[3]  Birgit Schmidt,et al.  Positioning and Power in Academic Publishing: Players, Agents and Agendas, 20th International Conference on Electronic Publishing, Göttingen, Germany, June 7-9, 2016 , 2016, ELPUB.

[4]  David J. Field,et al.  How Close Are We to Understanding V1? , 2005, Neural Computation.

[5]  L. Spillmann,et al.  Beyond the classical receptive field: The effect of contextual stimuli. , 2015, Journal of vision.

[6]  J. Movshon,et al.  Selectivity and spatial distribution of signals from the receptive field surround in macaque V1 neurons. , 2002, Journal of neurophysiology.

[7]  Andrew B. Watson,et al.  The cortex transform: rapid computation of simulated neural images , 1987 .

[8]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[9]  J. Touryan,et al.  Spatial Structure of Complex Cell Receptive Fields Measured with Natural Images , 2005, Neuron.

[10]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Eero P. Simoncelli,et al.  A Convolutional Subunit Model for Neuronal Responses in Macaque V1 , 2015, The Journal of Neuroscience.

[13]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[14]  B. Willmore,et al.  Neural Representation of Natural Images in Visual Area V2 , 2010, The Journal of Neuroscience.

[15]  Ha Hong,et al.  A performance-optimized model of neural responses across the ventral visual stream , 2016, bioRxiv.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  J. Daugman Two-dimensional spectral analysis of cortical receptive field profiles , 1980, Vision Research.

[18]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[19]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[21]  Alexander S. Ecker,et al.  Neural system identification for large populations separating "what" and "where" , 2017, NIPS.

[22]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[23]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[24]  Andrei Gorea,et al.  Time dilates more with apparent than with physical speed. , 2015, Journal of vision.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[27]  Curtis L Baker,et al.  Natural versus Synthetic Stimuli for Estimating Receptive Field Models: A Comparison of Predictive Robustness , 2012, The Journal of Neuroscience.

[28]  Matthias Bethge,et al.  Natural Image Coding in V1: How Much Use Is Orientation Selectivity? , 2008, PLoS Comput. Biol..

[29]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[30]  Alexander S. Ecker,et al.  State Dependence of Noise Correlations in Macaque Primary Visual Cortex , 2014, Neuron.

[31]  Athanassios G. Siapas,et al.  Model-based spike sorting with a mixture of drifting t-distributions , 2017, Journal of Neuroscience Methods.

[32]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[33]  Liam Paninski,et al.  Kalman Filter Mixture Model for Spike Sorting of Non-stationary Data , 2010 .

[34]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[35]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[36]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[37]  Ryan J. Prenger,et al.  The Berkeley Wavelet Transform: A Biologically Inspired Orthogonal Wavelet Transform , 2008, Neural Computation.

[38]  J. Movshon,et al.  Receptive field organization of complex cells in the cat's striate cortex. , 1978, The Journal of physiology.

[39]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Shiming Tang,et al.  Complex Pattern Selectivity in Macaque Primary Visual Cortex Revealed by Large-Scale Two-Photon Imaging , 2018, Current Biology.

[42]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[43]  J. Movshon,et al.  Time Course and Time-Distance Relationships for Surround Suppression in Macaque V1 Neurons , 2003, The Journal of Neuroscience.

[44]  Matthias Bethge,et al.  Hierarchical Modeling of Local Image Features through $L_p$-Nested Symmetric Distributions , 2009, NIPS.

[45]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[46]  J. Movshon,et al.  Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. , 2002, Journal of neurophysiology.

[47]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[48]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[49]  Alexander S. Ecker,et al.  Decorrelated Neuronal Firing in Cortical Microcircuits , 2010, Science.

[50]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[51]  William F. Kindel,et al.  Using deep learning to reveal the neural code for images in primary visual cortex , 2017, ArXiv.

[52]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[53]  D. Heeger Half-squaring in responses of cat striate cells , 1992, Visual Neuroscience.

[54]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[56]  Liam Paninski,et al.  Multilayer Recurrent Network Models of Primate Retinal Ganglion Cell Responses , 2016, ICLR.

[57]  Ming Li,et al.  Convolutional neural network models of V1 responses to complex patterns , 2018, Journal of Computational Neuroscience.

[58]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[59]  Alexander S. Ecker,et al.  Attentional fluctuations induce shared variability in macaque primary visual cortex , 2017, Nature Communications.

[60]  Alexander S. Ecker,et al.  DataJoint: managing big scientific data using MATLAB or Python , 2015, bioRxiv.

[61]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  S. Morad,et al.  Ceramide-orchestrated signalling in cancer cells , 2012, Nature Reviews Cancer.

[63]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[64]  Surya Ganguli,et al.  Deep Learning Models of the Retinal Response to Natural Scenes , 2017, NIPS.

[65]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[66]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[67]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[68]  Eero P. Simoncelli,et al.  To appear in: The New Cognitive Neurosciences, 3rd edition Editor: M. Gazzaniga. MIT Press, 2004. Characterization of Neural Responses with Stochastic Stimuli , 2022 .

[69]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[70]  Eero P. Simoncelli,et al.  Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[71]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[72]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.