Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context

Spoken word recognition in context is remarkably fast and accurate, with recognition times of ∼200 ms, typically well before the end of the word. The neurocomputational mechanisms underlying these contextual effects are still poorly understood. This study combines source-localized electroencephalographic and magnetoencephalographic (EMEG) measures of real-time brain activity with multivariate representational similarity analysis to determine directly the timing and computational content of the processes evoked as spoken words are heard in context, and to evaluate the respective roles of bottom-up and predictive processing mechanisms in the integration of sensory and contextual constraints. Male and female human participants heard simple (modifier-noun) English phrases that varied in the degree of semantic constraint that the modifier (W1) exerted on the noun (W2), as in pairs, such as “yellow banana.” We used gating tasks to generate estimates of the probabilistic predictions generated by these constraints as well as measures of their interaction with the bottom-up perceptual input for W2. Representation similarity analysis models of these measures were tested against electroencephalographic and magnetoencephalographic brain data across a bilateral fronto-temporo-parietal language network. Consistent with probabilistic predictive processing accounts, we found early activation of semantic constraints in frontal cortex (LBA45) as W1 was heard. The effects of these constraints (at 100 ms after W2 onset in left middle temporal gyrus and at 140 ms in left Heschl's gyrus) were only detectable, however, after the initial phonemes of W2 had been heard. Within an overall predictive processing framework, bottom-up sensory inputs are still required to achieve early and robust spoken word recognition in context. SIGNIFICANCE STATEMENT Human listeners recognize spoken words in natural speech contexts with remarkable speed and accuracy, often identifying a word well before all of it has been heard. In this study, we investigate the brain systems that support this important capacity, using neuroimaging techniques that can track real-time brain activity during speech comprehension. This makes it possible to locate the brain areas that generate predictions about upcoming words and to show how these expectations are integrated with the evidence provided by the speech being heard. We use the timing and localization of these effects to provide the most specific account to date of how the brain achieves an optimal balance between prediction and sensory input in the interpretation of spoken language.

[1]  Alex Clarke,et al.  Syntactic Computations in the Language Network: Characterizing Dynamic Network Properties Using Representational Similarity Analysis , 2013, Front. Psychol..

[2]  Roy D. Patterson,et al.  Locating the initial stages of speech–sound processing in human temporal cortex , 2006, NeuroImage.

[3]  Karl J. Friston,et al.  Active inference, communication and hermeneutics , 2015, Cortex.

[4]  Jeroen Geertzen,et al.  The Centre for Speech, Language and the Brain (CSLB) concept property norms , 2013, Behavior research methods.

[5]  E. Halgren,et al.  Dynamic Statistical Parametric Mapping Combining fMRI and MEG for High-Resolution Imaging of Cortical Activity , 2000, Neuron.

[6]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[7]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[8]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[9]  Matthew H. Davis,et al.  Perceptual learning of degraded speech by minimizing prediction error , 2016, Proceedings of the National Academy of Sciences.

[10]  William D Marslen-Wilson,et al.  Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[11]  Peter Hagoort,et al.  MUC (Memory, Unification, Control) and beyond , 2013, Front. Psychol..

[12]  S. Scott,et al.  The neuroanatomical and functional organization of speech perception , 2003, Trends in Neurosciences.

[13]  Li Su,et al.  Spatiotemporal Searchlight Representational Similarity Analysis in EMEG Source Space , 2012, 2012 Second International Workshop on Pattern Recognition in NeuroImaging.

[14]  Gina R Kuperberg,et al.  What do we mean by prediction in language comprehension? , 2016, Language, cognition and neuroscience.

[15]  J. Trueswell,et al.  Cognitive control and parsing: Reexamining the role of Broca’s area in sentence comprehension , 2005, Cognitive, affective & behavioral neuroscience.

[16]  Andrew Thwaites,et al.  Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem , 2017, bioRxiv.

[17]  W. Marslen-Wilson Functional parallelism in spoken word-recognition , 1987, Cognition.

[18]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[19]  Sharon L. Thompson-Schill,et al.  Tracking competition and cognitive control during language comprehension with multi-voxel pattern analysis , 2017, Brain and Language.

[20]  W. Marslen-Wilson,et al.  Optimally Efficient Neural Systems for Processing Spoken Language , 2012, Cerebral cortex.

[21]  L. Tyler,et al.  Decoding the Cortical Dynamics of Sound-Meaning Mapping , 2017, The Journal of Neuroscience.

[22]  F Grosjean,et al.  Spoken word recognition processes and the gating paradigm , 1980, Perception & psychophysics.

[23]  Alexander Borst,et al.  How does Nature Program Neuron Types? , 2008, Front. Neurosci..

[24]  WILLIAM MARSLEN-WILSON,et al.  Processing structure of sentence perception , 1975, Nature.

[25]  John J. L. Morton,et al.  Interaction of information in word recognition. , 1969 .

[26]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[27]  W D Marslen-Wilson,et al.  Sentence Perception as an Interactive Parallel Process , 1975, Science.

[28]  Martin Luessi,et al.  MEG and EEG data analysis with MNE-Python , 2013, Front. Neuroinform..

[29]  WILLIAM MARSLEN-WILSON,et al.  Linguistic Structure and Speech Shadowing at Very Short Latencies , 1973, Nature.

[30]  Steven G. Luke,et al.  Limits on lexical prediction during reading , 2016, Cognitive Psychology.

[31]  Ellen F. Lau,et al.  Comprehenders Rationally Adapt Semantic Predictions to the Statistics of the Local Environment: a Bayesian Model of Trial-by-Trial N400 Amplitudes , 2017, CogSci.

[32]  Nina Kraus,et al.  Relating Structure to Function: Heschl's Gyrus and Acoustic Processing , 2009, The Journal of Neuroscience.

[33]  M. Corballis Wandering tales: evolutionary origins of mental time travel and language , 2013, Front. Psychol..

[34]  W. Marslen-Wilson,et al.  Representation and competition in the perception of spoken words , 2002, Cognitive Psychology.

[35]  William D. Marslen-Wilson,et al.  Integrating Form and Meaning: A Distributed Model of Speech Perception. , 1997 .

[36]  S. Taulu,et al.  Applications of the signal space separation method , 2005, IEEE Transactions on Signal Processing.

[37]  L. Tyler,et al.  Quantifying contextual contributions to word-recognition processes , 1983, Perception & psychophysics.

[38]  E. T. Possing,et al.  Human temporal lobe activation by speech and nonspeech sounds. , 2000, Cerebral cortex.

[39]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[40]  Seppo P. Ahlfors,et al.  Assessing and improving the spatial accuracy in MEG source localization by depth-weighted minimum-norm estimates , 2006, NeuroImage.