Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision

Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode functional magnetic resonance imaging data from humans watching natural movies, despite its lack of any mechanism to account for temporal dynamics or feedback processing. Using separate data, encoding and decoding models were developed and evaluated for describing the bi-directional relationships between the CNN and the brain. Through the encoding models, the CNN-predicted areas covered not only the ventral stream, but also the dorsal stream, albeit to a lesser degree; single-voxel response was visualized as the specific pixel pattern that drove the response, revealing the distinct representation of individual cortical location; cortical activation was synthesized from natural images with high-throughput to map category representation, contrast, and selectivity. Through the decoding models, fMRI signals were directly decoded to estimate the feature representations in both visual and semantic spaces, for direct visual reconstruction and semantic categorization, respectively. These results corroborate, generalize, and extend previous findings, and highlight the value of using deep learning, as an all-in-one model of the visual cortex, to understand and decode natural vision.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[3]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[4]  Mark H. Johnson Subcortical face processing , 2005, Nature Reviews Neuroscience.

[5]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[6]  Diego Contreras,et al.  Long-Range Parallel Processing and Local Recurrent Activity in the Visual Cortex of the Mouse , 2012, The Journal of Neuroscience.

[7]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  R. Malach,et al.  Intersubject Synchronization of Cortical Activity During Natural Vision , 2004, Science.

[9]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[10]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[13]  S. Kosslyn,et al.  Neural Systems Shared by Visual Imagery and Visual Perception: A Positron Emission Tomography Study , 1997, NeuroImage.

[14]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  F. Tong,et al.  Decoding the visual and subjective contents of the human brain , 2005, Nature Neuroscience.

[17]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[18]  B. Rossion,et al.  Defining face perception areas in the human brain: A large-scale factorial fMRI face localizer analysis , 2012, Brain and Cognition.

[19]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[20]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[21]  J. Gallant,et al.  Complete functional characterization of sensory neurons by system identification. , 2006, Annual review of neuroscience.

[22]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Galit Yovel,et al.  Two neural pathways of face processing: A critical evaluation of current models , 2015, Neuroscience & Biobehavioral Reviews.

[25]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[26]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[27]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[28]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[29]  J. Haynes Brain Reading: Decoding Mental States From Brain Activity In Humans , 2011 .

[30]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[31]  Jesper Andersson,et al.  A multi-modal parcellation of human cerebral cortex , 2016, Nature.

[32]  Masa-aki Sato,et al.  Visual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders , 2008, Neuron.

[33]  Antonio Torralba,et al.  Deep Neural Networks predict Hierarchical Spatio-temporal Cortical Dynamics of Human Visual Object Recognition , 2016, ArXiv.

[34]  Y Kamitani,et al.  Neural Decoding of Visual Imagery During Sleep , 2013, Science.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[37]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ryan J. Prenger,et al.  Bayesian Reconstruction of Natural Images from Human Brain Activity , 2009, Neuron.

[39]  Haluk Öğmen,et al.  Feedforward and feedback processes in vision , 2015, Front. Psychol..

[40]  R. McIntosh,et al.  Do we have independent visual streams for perception and action? , 2010, Cognitive neuroscience.

[41]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[42]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[43]  Jack L. Gallant,et al.  Decoding the Semantic Content of Natural Movies from Human Brain Activity , 2016, Frontiers in systems neuroscience.

[44]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[45]  Keiji Tanaka,et al.  Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey , 2008, Neuron.

[46]  Jean-Baptiste Poline,et al.  Inverse retinotopy: Inferring the visual content of images from brain activation patterns , 2006, NeuroImage.

[47]  David C. Plaut,et al.  ‘What’ Is Happening in the Dorsal Visual Pathway , 2016, Trends in Cognitive Sciences.

[48]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[49]  Edward M. Callaway,et al.  Feedforward, feedback and inhibitory connections in primate visual cortex , 2004, Neural Networks.

[50]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[51]  Mark Jenkinson,et al.  The minimal preprocessing pipelines for the Human Connectome Project , 2013, NeuroImage.

[52]  Alan Cowey,et al.  On the usefulness of ‘what’ and ‘where’ pathways in vision , 2011, Trends in Cognitive Sciences.

[53]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[54]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[55]  B. Wandell,et al.  Visual Field Maps in Human Cortex , 2007, Neuron.

[56]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[57]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[58]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[59]  Johannes Bernarding,et al.  Increasing the reliability of data analysis of functional magnetic resonance imaging by applying a new blockwise permutation method , 2014, Front. Neuroinform..

[60]  Dingwen Li,et al.  Visualization of Deep Convolutional Neural Networks , 2016 .

[61]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[62]  Jack L. Gallant,et al.  Encoding and decoding in fMRI , 2011, NeuroImage.

[63]  Nikolaus Kriegeskorte,et al.  Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models , 2014, bioRxiv.

[64]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[65]  Zhongming Liu,et al.  Influences of High-Level Features, Gaze, and Scene Transitions on the Reliability of BOLD Responses to Natural Movie Stimuli , 2016, PloS one.

[66]  Mark Jenkinson,et al.  Correspondences between retinotopic areas and myelin maps in human visual cortex , 2014, NeuroImage.

[67]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[68]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[69]  J. Gallant,et al.  Cortical representation of animate and inanimate objects in complex natural scenes , 2012, Journal of Physiology-Paris.

[70]  T. Carlson,et al.  Patterns of Activity in the Categorical Representations of Objects , 2003, Journal of Cognitive Neuroscience.

[71]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[72]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[73]  Eugenio Culurciello,et al.  Visual attention with deep neural networks , 2015, 2015 49th Annual Conference on Information Sciences and Systems (CISS).

[74]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[75]  Timon Schroeter,et al.  Visual Interpretation of Kernel‐Based Prediction Models , 2011, Molecular informatics.

[76]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[77]  N. Kriegeskorte,et al.  Categorical, Yet Graded – Single-Image Activation Profiles of Human Category-Selective Cortical Regions , 2012, The Journal of Neuroscience.

[78]  Po-Jang Hsieh,et al.  “Brain‐reading” of perceived colors reveals a feature mixing mechanism underlying perceptual filling‐in in cortical area V1 , 2010, Human brain mapping.

[79]  Marcel van Gerven,et al.  Increasingly complex representations of natural movies across the dorsal stream are shared between subjects , 2017, NeuroImage.

[80]  Tomoyasu Horikawa,et al.  Generic decoding of seen and imagined objects using hierarchical visual features , 2015, Nature Communications.

[81]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[82]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[83]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[84]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[85]  Sean M. Polyn,et al.  Beyond mind-reading: multi-voxel pattern analysis of fMRI data , 2006, Trends in Cognitive Sciences.

[86]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[87]  Walter J. Scheirer,et al.  Using human brain activity to guide machine learning , 2017, Scientific Reports.

[88]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[89]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[90]  David A. Leopold,et al.  Functional MRI mapping of dynamic visual features during natural viewing in the macaque , 2015, NeuroImage.