The Organization of a Neurocomputational Control Model for Articulatory Speech Synthesis

The organization of a computational control model of articulatory speech synthesis is outlined in this paper. The model is based on general principles of neurophysiology and cognitive psychology. Thus it is based on such neural control circuits, neural maps and mappings as are hypothesized to exist in the human brain, and the model is based on learning or training mechanisms similar to those occurring during the human process of speech acquisition. The task of the control module is to generate articulatory data for controlling an articulatory-acoustic speech synthesizer. Thus a com plete "BIONIC" (i.e. BIOlogically motivated and techNICally realized) speech syn the sizer is described, capable of generating linguistic, sensory, and motor neural representations of sounds, syllables, and words, capable of generating articu latory speech movements from neuromuscular activation, and subse quently capable of generating acoustic speech signals by controlling an articu latory-acoustic vocal tract model. The module developed thus far is capable of producing single sounds (vowels and consonants), simple CV- and VC-syllables, and first sample words. In addition, processes of human-human interaction occurring during speech acquisition (mother-child or carer-child interactions) are briefly discussed in this paper.

[1]  J. Rothwell Principles of Neural Science , 1982 .

[2]  Norbert Hoffmann,et al.  Simulation Neuronaler Netze , 1991 .

[3]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[4]  W. Levelt,et al.  Do speakers have access to a mental syllabary? , 1994, Cognition.

[5]  Andreas Zell,et al.  Simulation neuronaler Netze , 1994 .

[6]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[7]  Willem J. M. Levelt,et al.  A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[8]  Gary S. Dell,et al.  Connectionist models of language production: lexical access and grammatical encoding , 1999, Cogn. Sci..

[9]  D. Oller,et al.  Precursors to speech in infancy: the prediction of speech and language disorders. , 1999, Journal of Communication Disorders.

[10]  G. Bailly,et al.  Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling. , 2001, The Journal of the Acoustical Society of America.

[11]  G. Rizzolatti,et al.  Speech listening specifically modulates the excitability of tongue muscles: a TMS study , 2002, The European journal of neuroscience.

[12]  Gérard Bailly,et al.  Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images , 2002, J. Phonetics.

[13]  G. Rizzolatti,et al.  Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons , 2002, Science.

[14]  Olov Engwall,et al.  Combining MRI, EMA and EPG measurements in a three-dimensional tongue model , 2003, Speech Commun..

[15]  W. Levelt,et al.  The spatial and temporal signatures of word production components , 2004, Cognition.

[16]  Karl J. Friston,et al.  Human Brain Function, Second Edition , 2004 .

[17]  L. Craighero,et al.  Electrophysiology of Action Representation , 2004, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[18]  Peter Birkholz,et al.  Construction And Control Of A Three-Dimensional Vocal Tract Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  F. Guenther Cortical interactions underlying the production of speech sounds. , 2006, Journal of communication disorders.

[20]  Satrajit S. Ghosh,et al.  Neural modeling and imaging of the cortical interactions underlying syllable production , 2006, Brain and Language.

[21]  C Neuschaefer-Rube,et al.  MODELING THE PERCEPTUAL MAGNET EFFECT AND CATEGORICAL PERCEPTION USING SELF-ORGANIZING NEURAL NETWORKS , 2007 .

[22]  Peter Birkholz,et al.  Simulation of Losses Due to Turbulence in the Time-Varying Vocal System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Peter Birkholz,et al.  Control concepts for articulatory speech synthesis , 2007, SSW.

[24]  Peter Birkholz,et al.  A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis , 2007, COST 2102 Workshop.