The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks

Abstract Visual object representations are commonly thought to emerge rapidly, yet it has remained unclear to what extent early brain responses reflect purely low‐level visual features of these objects and how strongly those features contribute to later categorical or conceptual representations. Here, we aimed to estimate a lower temporal bound for the emergence of conceptual representations by defining two criteria that characterize such representations: 1) conceptual object representations should generalize across different exemplars of the same object, and 2) these representations should reflect high‐level behavioral judgments. To test these criteria, we compared magnetoencephalography (MEG) recordings between two groups of participants (n = 16 per group) exposed to different exemplar images of the same object concepts. Further, we disentangled low‐level from high‐level MEG responses by estimating the unique and shared contribution of models of behavioral judgments, semantics, and different layers of deep neural networks of visual object processing. We find that 1) both generalization across exemplars as well as generalization of object‐related signals across time increase after 150 ms, peaking around 230 ms; 2) representations specific to behavioral judgments emerged rapidly, peaking around 160 ms. Collectively, these results suggest a lower bound for the emergence of conceptual object representations around 150 ms following stimulus onset. HighlightsUsed MEG to reveal lower bound for emergence of conceptual object representations.Two criteria: between‐exemplar generalization and relationship to behavior.MEG pattern similarity between exemplars rises after 160 ms.Model of behavior explains unique MEG variance after 150 ms.Earlier MEG response well captured by early layer of deep neural network model.

[1]  Dimitrios Pantazis,et al.  Similarity-based fusion of MEG and fMRI reveals spatio-temporal dynamics in human cortex during visual object recognition , 2015 .

[2]  P. Sajda,et al.  Temporal characterization of the neural correlates of perceptual decision making in the human brain. , 2006, Cerebral cortex.

[3]  富岡 洋一,et al.  同期シフトデータ転送に基づくDeep Convolutional Neural NetworkのFPGA実装(検出技術と設計手法,FPGA応用及び一般) , 2015 .

[4]  Chris I Baker,et al.  Contributions of low- and high-level properties to neural processing of visual scenes in the human brain , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  David A. Tovar,et al.  Representational dynamics of object vision: the first 1000 ms. , 2013, Journal of vision.

[6]  Michael Eickenberg,et al.  Seeing it all: Convolutional network layers map the function of the human visual system , 2017, NeuroImage.

[7]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[8]  Katherine R. Storrs,et al.  Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments , 2017, Front. Psychol..

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  N. Kriegeskorte,et al.  Inverse MDS: Inferring Dissimilarity Structure from Multiple Item Arrangements , 2012, Front. Psychology.

[11]  R. Goldstone An efficient method for obtaining similarity data , 1994 .

[12]  Marcel van Gerven,et al.  MEG-based decoding of the spatiotemporal dynamics of visual category perception , 2013, NeuroImage.

[13]  S. Dehaene,et al.  Characterizing the dynamics of mental representations: the temporal generalization method , 2014, Trends in Cognitive Sciences.

[14]  Radoslaw Martin Cichy,et al.  Multivariate pattern analysis for MEG: A comparison of dissimilarity measures , 2018, NeuroImage.

[15]  Haiguang Wen Neural Encoding and Decoding with Deep Learning for Natural Vision , 2018 .

[16]  Jack L. Gallant,et al.  Fourier power, subjective distance, and object categories all provide plausible models of BOLD responses in scene-selective visual areas , 2015, Front. Comput. Neurosci..

[17]  Yizhen Zhang,et al.  Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision , 2016, Cerebral cortex.

[18]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[19]  Susan G. Wardle,et al.  Decoding Dynamic Brain Patterns from Evoked Responses: A Tutorial on Multivariate Pattern Analysis Applied to Time Series Neuroimaging Data , 2016, Journal of Cognitive Neuroscience.

[20]  Nikolaus Kriegeskorte,et al.  Fixed versus mixed RSA: Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models , 2016 .

[21]  Victor A. F. Lamme,et al.  Spatially Pooled Contrast Responses Predict Neural and Perceptual Similarity of Naturalistic Image Categories , 2012, PLoS Comput. Biol..

[22]  Michelle R. Greene,et al.  Visual scenes are categorized by function. , 2016, Journal of experimental psychology. General.

[23]  E. Pedhazur Multiple Regression in Behavioral Research: Explanation and Prediction , 1982 .

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Mark Davies The Corpus of Contemporary American English (COCA) , 2012 .

[26]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[27]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[28]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[29]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[30]  S. Thorpe,et al.  Speed of processing in the human visual system , 1996, Nature.

[31]  Susan G. Wardle,et al.  Decoding the time-course of object recognition in the human brain: From visual features to categorical decisions , 2017, Neuropsychologia.

[32]  David D Coggan,et al.  The Role of Visual and Semantic Properties in the Emergence of Category-Specific Patterns of Neural Response in the Human Brain , 2016, eNeuro.

[33]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[34]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[35]  L. Tyler,et al.  Predicting the Time Course of Individual Objects with MEG , 2014, Cerebral cortex.

[36]  Billi Randall,et al.  From perception to conception: how meaningful objects are processed over time. , 2013, Cerebral cortex.

[37]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[38]  Nikolaus Kriegeskorte,et al.  Perceptual similarity of visual patterns predicts dynamic neural activation patterns measured with MEG , 2016, NeuroImage.

[39]  Jasna Martinovic,et al.  Coding of Visual Object Features and Feature Conjunctions in the Human Brain , 2008, PloS one.

[40]  Martin N. Hebart,et al.  The Decoding Toolbox (TDT): a versatile software package for multivariate analyses of functional imaging data , 2015, Front. Neuroinform..

[41]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[42]  Sennay Ghebreab,et al.  From Image Statistics to Scene Gist: Evoked Neural Activity Reveals Transition from Low-Level Natural Image Structure to Scene Category , 2013, The Journal of Neuroscience.

[43]  Amy Beth Warriner,et al.  Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.

[44]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[45]  Li Fei-Fei,et al.  Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior , 2018, eLife.

[46]  L. Tyler,et al.  Understanding What We See: How We Derive Meaning From Vision , 2015, Trends in Cognitive Sciences.

[47]  Richard M. Leahy,et al.  Brainstorm: A User-Friendly Application for MEG/EEG Analysis , 2011, Comput. Intell. Neurosci..

[48]  Radoslaw Martin Cichy,et al.  Resolving human object recognition in space and time , 2014, Nature Neuroscience.

[49]  Chris I. Baker,et al.  Deconstructing multivariate decoding for the study of brain function , 2017, NeuroImage.

[50]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.

[51]  Marcel A J van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2015, The Journal of Neuroscience.

[52]  Brad Wyble,et al.  Detecting meaning in RSVP at 13 ms per picture , 2013, Attention, perception & psychophysics.

[53]  Marcel van Gerven,et al.  Convolutional neural network-based encoding and decoding of visual object recognition in space and time , 2017, NeuroImage.

[54]  Radoslaw Martin Cichy,et al.  The representational dynamics of task and object processing in humans , 2018, eLife.

[55]  Nikolaus Kriegeskorte,et al.  Neural dynamics of real-world object vision that guide behaviour , 2017, bioRxiv.

[56]  Thomas A. Carlson,et al.  Emerging Object Representations in the Visual System Predict Reaction Times for Categorization , 2015, PLoS Comput. Biol..

[57]  S. Thorpe,et al.  The Time Course of Visual Processing: From Early Perception to Decision-Making , 2001, Journal of Cognitive Neuroscience.

[58]  Christopher N. Johnson,et al.  Return of the devil , 2016 .

[59]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..