The architecture of speech production and the role of the phoneme in speech processing

Speech production has been studied within a number of traditions including linguistics, psycholinguistics, motor control, neuropsychology and neuroscience. These traditions have had limited interaction, ostensibly because they target different levels of speech production or different dimensions such as representation, processing or implementation. However, closer examination of reveals a substantial convergence of ideas across the traditions and recent proposals have suggested that an integrated approach may help move the field forward. The present article reviews one such attempt at integration, the state feedback control (SFC) model and its descendent, the hierarchical SFC model. Also considered is how phoneme-level representations might fit in the context of the model.

[1]  Steven Greenberg,et al.  UNDERSTANDING SPEECH UNDERSTANDING: TOWARDS A UNIFIED THEORY OF SPEECH PERCEPTION , 1996 .

[2]  D. Ostry,et al.  Somatosensory basis of speech production , 2003, Nature.

[3]  M. Ullman Contributions of memory circuits to language: the declarative/procedural model , 2004, Cognition.

[4]  G. Hickok Computational neuroanatomy of speech production , 2012, Nature Reviews Neuroscience.

[5]  Harold Bekkering,et al.  Sensorimotor integration , 2000 .

[6]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[7]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[8]  Satrajit S. Ghosh,et al.  Neural modeling and imaging of the cortical interactions underlying syllable production , 2006, Brain and Language.

[9]  R A Andersen,et al.  Multimodal integration for the representation of space in the posterior parietal cortex. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[10]  Gregory Hickok,et al.  Auditory word comprehension impairment in acute stroke: Relative contribution of phonemic versus semantic factors , 2008, Brain and Language.

[11]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[12]  F H Guenther,et al.  Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. , 1995, Psychological review.

[13]  D. Poeppel,et al.  Towards a functional neuroanatomy of speech perception , 2000, Trends in Cognitive Sciences.

[14]  A. Caramazza,et al.  The locus of the frequency effect in picture naming: When recognizing is not enough , 2007, Psychonomic bulletin & review.

[15]  Gregory Hickok,et al.  Sublexical Properties of Spoken Words Modulate Activity in Broca's Area but Not Superior Temporal Cortex: Implications for Models of Speech Recognition , 2011, Journal of Cognitive Neuroscience.

[16]  G. Fink,et al.  REVIEW: The functional organization of the intraparietal sulcus in humans and monkeys , 2005, Journal of anatomy.

[17]  A. Meltzoff,et al.  Infant vocalizations in response to speech: vocal imitation and developmental change. , 1996, The Journal of the Acoustical Society of America.

[18]  G. Hickok The cortical organization of speech processing: feedback control and predictive coding the context of a dual-stream model. , 2012, Journal of communication disorders.

[19]  P. Kuhl Brain Mechanisms in Early Language Acquisition , 2010, Neuron.

[20]  David Poeppel,et al.  Cortical oscillations and speech processing: emerging computational principles and operations , 2012, Nature Neuroscience.

[21]  T. Bever,et al.  The nonperceptual reality of the phoneme. , 1970 .

[22]  Kayoko Okada,et al.  Bilateral capacity for speech sound processing in auditory comprehension: Evidence from Wada procedures , 2008, Brain and Language.

[23]  H. Buckingham,et al.  Perseveration and other repetitive verbal behaviors: functional dissociations. , 2004, Seminars in speech and language.

[24]  Stephen Grossberg,et al.  Resonant neural dynamics of speech perception , 2003, J. Phonetics.

[25]  D. Bishop,et al.  The relationship between phoneme discrimination, speech production, and language comprehension in cerebral-palsied individuals. , 1990, Journal of speech and hearing research.

[26]  Gary S Dell,et al.  Naming and repetition in aphasia: Steps, routes, and frequency effects. , 2010, Journal of memory and language.

[27]  G. A. Miller,et al.  The intelligibility of speech as a function of the context of the test materials. , 1951, Journal of experimental psychology.

[28]  Martin Corley,et al.  Phonological Encoding and Monitoring in Normal and Pathological Speech , 2005 .

[29]  J. Jackson Remarks on Evolution and Dissolution of the Nervous System , 1887 .

[30]  Scott T. Grafton,et al.  Forward modeling allows feedback control for fast reaching movements , 2000, Trends in Cognitive Sciences.

[31]  U. Halsband,et al.  Motor learning in man: A review of functional and clinical studies , 2006, Journal of Physiology-Paris.

[32]  W. Levelt,et al.  Word frequency effects in speech production: Retrieval of syntactic information and of phonological form , 1994 .

[33]  A. J. Yates Delayed Auditory Feedback , 1958, Psychological bulletin.

[34]  Steven Greenberg,et al.  What are the Essential Cues for Understanding Spoken Language? , 2001, IEICE Trans. Inf. Syst..

[35]  S. Nagarajan,et al.  Magnetoencephalographic evidence for a precise forward model in speech production , 2006, Neuroreport.

[36]  Willem J. M. Levelt,et al.  A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[37]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[38]  G S Dell,et al.  A spreading-activation theory of retrieval in sentence production. , 1986, Psychological review.

[39]  J. Krakauer,et al.  Error correction, sensory prediction, and adaptation in motor control. , 2010, Annual review of neuroscience.

[40]  David Poeppel,et al.  The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time' , 2003, Speech Commun..

[41]  Sonja A. Kotz,et al.  The Cerebellum Generates Motor-to-Auditory Predictions: ERP Lesion Evidence , 2012, Journal of Cognitive Neuroscience.

[42]  J. Rauschecker Cortical processing of complex sounds , 1998, Current Opinion in Neurobiology.

[43]  Victor S Ferreira,et al.  Language production. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[44]  A. Graybiel Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[45]  A. Postma Detection of errors during speech production: a review of speech monitoring models , 2000, Cognition.

[46]  A. Lotto,et al.  Neighboring spectral content influences vowel identification. , 2000, The Journal of the Acoustical Society of America.

[47]  Gregory Hickok,et al.  Orthogonal acoustic dimensions define auditory field maps in human cortex , 2012, Proceedings of the National Academy of Sciences.

[48]  Willem J. M. Levelt,et al.  Effects of syllable frequency in speech production , 2006, Cognition.

[49]  G. Dell,et al.  Evidence for the involvement of a nonlexical route in the repetition of familiar words: A comparison of single and dual route models of auditory repetition , 2004, Cognitive neuropsychology.

[50]  C Y Espy-Wilson,et al.  Articulatory tradeoffs reduce acoustic variability during American English /r/ production. , 1999, The Journal of the Acoustical Society of America.

[51]  W. Skaggs,et al.  The Cerebellum , 2016 .

[52]  J. Serences,et al.  Adaptive Allocation of Attentional Gain , 2009, The Journal of Neuroscience.

[53]  P. Kuhl,et al.  Birdsong and human speech: common themes and mechanisms. , 1999, Annual review of neuroscience.

[54]  F. Guenther Cortical interactions underlying the production of speech sounds. , 2006, Journal of communication disorders.

[55]  C. Summerfield,et al.  Expectation (and attention) in visual cognition , 2009, Trends in Cognitive Sciences.

[56]  Scott T. Grafton The cognitive neuroscience of prehension: recent developments , 2010, Experimental Brain Research.

[57]  Michael I. Jordan,et al.  Sensorimotor adaptation in speech production. , 1998, Science.

[58]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[59]  J. Kalinowski,et al.  Effect of delayed auditory feedback on normal speakers at two speech rates. , 2002, The Journal of the Acoustical Society of America.

[60]  Srikantan S. Nagarajan,et al.  Speech Production as State Feedback Control , 2011, Front. Hum. Neurosci..

[61]  R. Hartsuiker,et al.  Listening to yourself is like listening to others: External, but not internal, verbal self-monitoring is based on speech perception , 2010 .

[62]  M. F. Garrett,et al.  The Analysis of Sentence Production1 , 1975 .

[63]  M. Arbib,et al.  Grasping objects: the cortical mechanisms of visuomotor transformation , 1995, Trends in Neurosciences.

[64]  J. D. Miller,et al.  Speech perception by the chinchilla: voiced-voiceless distinction in alveolar plosive consonants , 1975, Science.

[65]  Robert J. Hartsuiker,et al.  The division of labour between internal and external speech monitoring , 2005 .

[66]  Sophie K. Scott,et al.  Neural Correlates of Sublexical Processing in Phonological Working Memory , 2011, Journal of Cognitive Neuroscience.

[67]  Scott T. Grafton,et al.  Evidence for a distributed hierarchy of action representation in the brain. , 2007, Human movement science.

[68]  Lyndsey Nickels,et al.  Separating input and output phonology: semantic, phonological, and orthographic effects in short-term memory impairment , 2005, Cognitive neuropsychology.

[69]  B. Baars,et al.  Covert formulation and editing of anomalies in speech production: Evidence from experimentally elicited slips of the tongue , 1982 .

[70]  Daniel M. Wolpert,et al.  Hierarchical MOSAIC for movement generation , 2003 .

[71]  G. Fairbanks,et al.  Systematic research in experimental phonetics. I. A theory of the speech mechanism as a servosystem. , 1954, The Journal of speech and hearing disorders.

[72]  Gustavo Alonso,et al.  Temporal Structure , 2009, Encyclopedia of Database Systems.

[73]  C. Trevarthen,et al.  Two mechanisms of vision in primates , 1968, Psychologische Forschung.

[74]  K. Doya,et al.  A unifying computational framework for motor control and social interaction. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[75]  Gregory Hickok,et al.  Eight Problems for the Mirror Neuron Theory of Action Understanding in Monkeys and Humans , 2009, Journal of Cognitive Neuroscience.

[76]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[77]  Tamiko Azuma,et al.  Puzzle-solving science: the quixotic quest for units in speech perception , 2003, J. Phonetics.

[78]  David C. Plaut,et al.  The emergence of phonology from the interplay of speech comprehension and production ; A distributed connectionist approach , 1998 .

[79]  F. Guenther,et al.  Vowel Category Boundaries Enhance Cortical and Behavioral Responses to Speech Feedback Alterations , 2013, The Journal of Neuroscience.

[80]  Anne-Catherine Bachoud-Lévi,et al.  Breaking the mirror: Asymmetrical disconnection between the phonological input and output codes , 2007, Cognitive neuropsychology.

[81]  D. Poeppel,et al.  Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language , 2004, Cognition.

[82]  C. Wernicke Der aphasische Symptomencomplex: Eine psychologische Studie auf anatomischer Basis , 1874 .

[83]  M. Iacoboni,et al.  Listening to speech activates motor areas involved in speech production , 2004, Nature Neuroscience.

[84]  A Caramazza,et al.  Deficits in lexical and semantic processing: Implications for models of normal language , 1999, Psychonomic bulletin & review.

[85]  Tracy Love,et al.  Are mirror neurons the basis of speech perception? Evidence from five cases with damage to the purported human mirror system , 2011, Neurocase.

[86]  P. D. Eimas,et al.  Speech Perception in Infants , 1971, Science.

[87]  G. Schneider Two visual systems. , 1969, Science.

[88]  Randi C. Martin,et al.  Independence of Input and Output Phonology in Word Processing and Short-Term Memory , 1999 .

[89]  W. Levelt Speaking: From Intention to Articulation , 1990 .

[90]  D. Regan,et al.  Postadaptation orientation discrimination. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[91]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[92]  K H Mauritz,et al.  Motor deficits in patients with large-fiber sensory neuropathy. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Anthony J. Movshon,et al.  Optimal representation of sensory information by neural populations , 2006, Nature Neuroscience.

[94]  V. Gracco,et al.  Speech motor coordination and control: evidence from lip, jaw, and laryngeal movements , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[95]  Frank H. Guenther,et al.  An fMRI investigation of syllable sequence production , 2006, NeuroImage.

[96]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[97]  Kayoko Okada,et al.  Area Spt in the Human Planum Temporale Supports Sensory-motor Integration for Speech Processing Establishing the Existence of Distinct Sen- Sory versus Motor Activation Patterns Would Establish That , 2022 .

[98]  L. Fadiga,et al.  The Motor Somatotopy of Speech Perception , 2009, Current Biology.

[99]  Feng Rong,et al.  Sensorimotor Integration in Speech Processing: Computational Basis and Neural Organization , 2011, Neuron.

[100]  G. Dell,et al.  Lexical access in aphasic and nonaphasic speakers. , 1997, Psychological review.

[101]  S. Scott,et al.  Behavioral/systems/cognitive Functional Integration across Brain Regions Improves Speech Perception under Adverse Listening Conditions , 2022 .

[102]  J. Krakauer,et al.  A computational neuroanatomy for motor control , 2008, Experimental Brain Research.

[103]  D. Swinney,et al.  On the Psychological Reality of the Phoneme: Perception, Identification, and Consciousness. , 1973 .

[104]  J. Movshon,et al.  A new perceptual illusion reveals mechanisms of sensory decoding , 2007, Nature.

[105]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.

[106]  R. Ivry,et al.  The coordination of movement: optimal feedback control and beyond , 2010, Trends in Cognitive Sciences.

[107]  Gary S. Dell,et al.  Inner speech slips exhibit lexical bias, but not the phonemic similarity effect , 2008, Cognition.

[108]  D. Massaro Preperceptual images, processing time, and perceptual units in auditory perception. , 1972, Psychological review.

[109]  W. Ziegler,et al.  Unintended imitation in nonword repetition , 2009, Brain and Language.

[110]  Harold Goodglass,et al.  Diagnosis of Conduction Aphasia , 2013 .

[111]  D. Howard,et al.  Phonological Errors in Aphasic Naming: Comprehension, Monitoring and Lexicality , 1995, Cortex.

[112]  Jacques Mehler,et al.  The Role of Syllables in Speech Processing: Infant and Adult Data [and Discussion] , 1981 .

[113]  Joseph S. Perkell,et al.  Movement goals and feedback and feedforward control mechanisms in speech production , 2012, Journal of Neurolinguistics.

[114]  M. Merzenich,et al.  Modulation of the Auditory Cortex during Speech: An MEG Study , 2002, Journal of Cognitive Neuroscience.

[115]  A. Benton,et al.  On Aphasia , 1874, British medical journal.

[116]  Nina F. Dronkers,et al.  It’s either a cook or a baker: Patients with conduction aphasia get the gist but lose the trace , 2008, Brain and Language.

[117]  J. M. Anderson,et al.  Conduction Aphasia and the Arcuate Fasciculus: A Reexamination of the Wernicke–Geschwind Model , 1999, Brain and Language.

[118]  P. Bloom How children learn the meanings of words , 2000 .

[119]  Stephen M. Wilson,et al.  Speech perception when the motor system is compromised , 2009, Trends in Cognitive Sciences.

[120]  W. Levelt,et al.  Monitoring and self-repair in speech , 1983, Cognition.

[121]  C. Wernicke,et al.  Wernicke's works on aphasia. A sourcebook and review , 1979, Medical History.

[122]  Véronique Delvaux,et al.  The Influence of Ambient Speech on Adult Speech Productions through Unintentional Imitation , 2007, Phonetica.

[123]  A R Damasio,et al.  The anatomical basis of conduction aphasia. , 1980, Brain : a journal of neurology.

[124]  R. Hartsuiker,et al.  The interplay of meaning, sound, and syntax in sentence production. , 2002, Psychological bulletin.

[125]  Gregory Hickok,et al.  Speech Perception, Conduction Aphasia, and the Functional Neuroanatomy of Language , 2000 .

[126]  A Wingfield,et al.  Response Latencies in Naming Objects , 1965, The Quarterly journal of experimental psychology.

[127]  Jason A. Tourville,et al.  The integration of large-scale neural network modeling and functional brain imaging in speech motor control , 2010, NeuroImage.

[128]  T A Burnett,et al.  Comparison of voice F0 responses to pitch-shift onset and offset conditions. , 2001, The Journal of the Acoustical Society of America.

[129]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[130]  G. Dell,et al.  Is comprehension necessary for error detection? A conflict-based account of monitoring in speech production , 2011, Cognitive Psychology.

[131]  V L Gracco,et al.  Some organizational characteristics of speech movement control. , 1994, Journal of speech and hearing research.

[132]  L. Craighero,et al.  Broca's Area in Language, Action, and Music , 2009, Annals of the New York Academy of Sciences.

[133]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[134]  F. Guenther,et al.  A theoretical investigation of reference frames for the planning of speech movements. , 1998, Psychological review.

[135]  Gregory Hickok,et al.  The role of Broca’s area in speech perception: Evidence from aphasia revisited , 2011, Brain and Language.

[136]  S. Goldinger,et al.  Phonetic priming, neighborhood activation, and PARSYN , 2000, Perception & psychophysics.

[137]  A. Doupe,et al.  Translating birdsong: songbirds as a model for basic and applied medical research. , 2013, Annual review of neuroscience.

[138]  G. Hickok,et al.  Auditory–Motor Interaction Revealed by fMRI: Speech, Music, and Working Memory in Area Spt , 2003 .

[139]  C. Larson,et al.  Voice F0 responses to pitch-shifted auditory feedback: a preliminary study. , 1997, Journal of voice : official journal of the Voice Foundation.

[140]  Antje S. Meyer,et al.  An MEG Study of Picture Naming , 1998, Journal of Cognitive Neuroscience.

[141]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[142]  M. Arbib,et al.  Language within our grasp , 1998, Trends in Neurosciences.

[143]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[144]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[145]  Marco Iacoboni,et al.  The Essential Role of Premotor Cortex in Speech Perception , 2007, Current Biology.

[146]  E. Lenneberg Understanding language without ability to speak: a case report. , 1962, Journal of abnormal and social psychology.

[147]  W. Levelt Models of word production , 1999, Trends in Cognitive Sciences.

[148]  G. Rizzolatti,et al.  Cortical mechanism for the visual guidance of hand grasping movements in the monkey: A reversible inactivation study. , 2001, Brain : a journal of neurology.

[149]  Kamil Ugurbil,et al.  A functional magnetic resonance imaging study of the role of left posterior superior temporal gyrus in speech production: implications for the explanation of conduction aphasia , 2000, Neuroscience Letters.

[150]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[151]  D. Poeppel,et al.  Health, USA Reviewed by: , 2010 .

[152]  Kayoko Okada,et al.  Conduction aphasia, sensory-motor integration, and phonological short-term memory – An aggregate analysis of lesion and fMRI data , 2011, Brain and Language.

[153]  W. Levelt,et al.  Do speakers have access to a mental syllabary? , 1994, Cognition.

[154]  Willem J. M. Levelt,et al.  Perceptual uniqueness point effects in monitoring internal speech , 2007, Cognition.

[155]  M. Goodale,et al.  The visual brain in action , 1995 .

[156]  Colin Humphries,et al.  Role of left posterior superior temporal gyrus in phonological processing for speech perception and production , 2001, Cogn. Sci..

[157]  M. Garrett,et al.  Grammatical Gender Is on the Tip of Italian Tongues , 1997 .

[158]  Frank H. Guenther,et al.  Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models , 1997, Speech Commun..

[159]  F A Mussa-Ivaldi,et al.  Adaptive representation of dynamics during learning of a motor task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[160]  S. Nagarajan,et al.  Speech target modulates speaking induced suppression in auditory cortex , 2009, BMC Neuroscience.

[161]  Victoria A. Fromkin,et al.  The Non-Anomalous Nature of Anomalous Utterances , 1971 .

[162]  C. Larson,et al.  Voice F0 responses to manipulations in pitch feedback. , 1998, The Journal of the Acoustical Society of America.

[163]  Jason A. Tourville,et al.  Neural mechanisms underlying auditory feedback control of speech , 2008, NeuroImage.

[164]  Srikantan S. Nagarajan,et al.  Motor-induced Suppression of the Auditory Cortex , 2009, Journal of Cognitive Neuroscience.

[165]  Robert T. Knight,et al.  Spatiotemporal imaging of cortical activation during verb generation and picture naming , 2010, NeuroImage.

[166]  C. Fowler,et al.  Gestural drift in a bilingual speaker of Brazilian Portuguese and English , 1997 .