Commentary on “Sentential influences on acoustic-phonetic processing: a Granger causality analysis of multimodal imaging data”

In their paper, “Sentential Influences on Acoustic-Phonetic Processing: A Granger Causality Analysis of Multimodal Imaging Data”, Gow and Olson take on one of the most contentious issues in the speech perception field: Is the architecture of the speech perception system entirely bottom-up, or do higher level representations (e.g. lexical or sentential) affect the operation of lower level (e.g. phonetic) ones? In the literature, this argument has pitted “autonomous” models (e.g. Massaro, 1989; Norris, McQueen, & Cutler, 2000) against “interactive” ones (e.g. Grossberg, 1980; McClelland & Elman, 1986). Both model classes ultimately grew out of a conception of spoken word recognition – the cohort model – pioneered by Marslen-Wilson and his colleagues (e.g. Marslen-Wilson & Tyler, 1980; Marslen-Wilson & Welsh, 1978). The basic premise of the cohort model was that when a listener hears a spoken word, an initial “cohort” of lexical candidates gets activated by the word’s beginning, with the cohort then winnowed down to the correct item as more information becomes available. In terms of the top-down versus bottom-up issue, the cohort theory posited a mixture: The generation of the initial cohort was argued to be determined by bottom-up factors (i.e. the set of lexical candidates that matched the acoustic-phonetic information during the first 150 msec or so of the word), but the winnowing process used both bottom-up mismatch information and top-down (semantic) information. Experiments using the gating paradigm (Tyler, 1984; Tyler & Wessels, 1983) provided early evidence for this position, as a sentence’s semantic context did not seem to constrain the nature or number of initial candidates in the cohort, but it did affect the speed with which the candidates were eliminated from contention and thus how quickly the correct word was recognised. As theorists progressed from the cohort model’s starting point, two different directions were taken. One of these picked up on the bottom-up mechanism as the driving force, and the other expanded the role of topdown processes. The ensuing argument has been going on for decades, and has filled countless pages; it will not be resolved by a single finding, or a single paper. In part, the durability of the argument reflects the different types of evidence that each side can bring to bear. Those who argue for interactivity produce demonstrations in which manipulating a higher level factor affects the outcome at a lower level. Those who favour autonomous models either show null effects of a higher level factor on outcomes at a lower level, or argue that the apparently top-down effects shown by others could instead be due to post-perceptual decision-level influences. The prototypical domain for these two positions is the lexical influence on phonetic identification first reported by Ganong (1980). Ganong showed that the interpretation of an ambiguous phonetic segment is affected by lexical factors. For example, a segment that is midway between /d/ and /t/ will be reported as “d” if it begins “dash”, but the same segment will be reported as “t” in “task”. Theorists favouring interactive models take this outcome as evidence that lexical activation directly affects phonetic encoding, while theorists favouring autonomous models argue that the effect occurs at a decision level, with listeners post-perceptually combining the lexical and sublexical information. As Gow and Olson note, one of the objections to interactive models is philosophical rather than empirical: If the system already has enough information to activate the correct word (e.g. “task” or “dash”), what purpose is there to go back and shift the phonetic code (e.g. toward /t/ or toward /d/)? This argument also applies to increasingly popular “forward models” that seem to be modern variants of “analysis by synthesis” (Poeppel & Monahan, 2011; Stevens, 1960), as in these models

[1]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[2]  L. Tyler,et al.  Quantifying contextual contributions to word-recognition processes , 1983, Perception & psychophysics.

[3]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[4]  L. Tyler The structure of the initial cohort: Evidence from gating , 1984, Perception & Psychophysics.

[5]  David Poeppel,et al.  Feedforward and feedback in speech perception: Revisiting analysis by synthesis , 2011 .

[6]  C M Connine,et al.  Interactive use of lexical information in speech perception. , 1987, Journal of experimental psychology. Human perception and performance.

[7]  C. Connine Constraints on interactive processes in auditory word recognition: The role of sentence context , 1987 .

[8]  Seppo P. Ahlfors,et al.  Lexical influences on speech perception: A Granger causality analysis of MEG and EEG source estimates , 2008, NeuroImage.

[9]  Dawn G. Blasko,et al.  Effects of subsequent sentence context in auditory word recognition: Temporal and linguistic constrainst , 1991 .

[10]  W. Ganong Phonetic categorization in auditory word perception. , 1980, Journal of experimental psychology. Human perception and performance.

[11]  David W Gow,et al.  Sentential influences on acoustic-phonetic processing: a Granger causality analysis of multimodal imaging data , 2016, Language, cognition and neuroscience.

[12]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[13]  S. Grossberg How does a brain build a cognitive code , 1980 .

[14]  D. Swinney Lexical access during sentence comprehension: (Re)consideration of context effects , 1979 .

[15]  William D Marslen-Wilson,et al.  Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[16]  A. Samuel Phonemic restoration: insights from a new methodology. , 1981, Journal of experimental psychology. General.

[17]  L P Shapiro,et al.  "How to milk a coat:" the effects of semantic and acoustic information on phoneme categorization. , 1998, The Journal of the Acoustical Society of America.

[18]  David W. Gow,et al.  New Levels of Language Processing Complexity and Organization Revealed by Granger Causation , 2012, Front. Psychology.

[19]  K. Stevens Toward a Model for Speech Recognition , 1960 .

[20]  A G Samuel,et al.  Knowing a Word Affects the Fundamental Perception of The Sounds Within it , 2001, Psychological science.

[21]  D. Massaro Testing between the TRACE model and the fuzzy logical model of speech perception , 1989, Cognitive Psychology.

[22]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.