A Gesture-Based Concept for Speech Movement Control in Articulatory Speech Synthesis

An articulatory speech synthesizer comprising a three-dimensional vocal tract model and a gesture-based concept for control of articulatory movements is introduced and discussed in this paper. A modular learning concept based on speech perception is outlined for the creation of gestural control rules. The learning concept includes on sensory feedback information for articulatory states produced by the model itself, and auditory and visual information of speech items produced by external speakers. The complete model (control module and synthesizer) is capable of producing high-quality synthetic speech signals and introduces a scheme for the natural speech production and speech perception processes.

[1]  Peter Birkholz,et al.  Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets , 2007, INTERSPEECH.

[2]  C. Browman,et al.  Papers in Laboratory Phonology: Tiers in articulatory phonology, with some implications for casual speech , 1990 .

[3]  Bernd J. Kröger Ein phonetisches Modell der Sprachproduktion , 1998 .

[4]  R. S. McGowan,et al.  Extracting dynamic parameters from speech movement data. , 1993, The Journal of the Acoustical Society of America.

[5]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[6]  C Neuschaefer-Rube,et al.  MODELING THE PERCEPTUAL MAGNET EFFECT AND CATEGORICAL PERCEPTION USING SELF-ORGANIZING NEURAL NETWORKS , 2007 .

[7]  Pascal van Lieshout,et al.  Speech Motor Control in Normal and Disordered Speech: Future Developments in Theory and Methodology , 2004 .

[8]  Peter Birkholz,et al.  Construction And Control Of A Three-Dimensional Vocal Tract Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  C. Neuschaefer-Rube,et al.  Learning to associate speech-like sensory and motor states during babbling , 2006 .

[10]  P. Birkholz,et al.  Vocal Tract Model Adaptation Using Magnetic Resonance Imaging , 2006 .

[11]  Jun Tani,et al.  Motor primitive and sequence self-organization in a hierarchical recurrent neural network , 2004, Neural Networks.

[12]  K. J. Kohler Gestural Reorganization in Connected Speech: A Functional Viewpoint on ‘Articulatory Phonology’ , 1992 .

[13]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[14]  Louis Goldstein,et al.  Gestural specification using dynamically-defined articulatory structures , 1990 .

[15]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[16]  J. Perkell,et al.  A Neural Model of Speech Production and Its Application to Studies of the Role of Auditory Feedback in Speech , 2003 .

[17]  Takayuki Ito,et al.  Dynamical simulation of speech cooperative articulation by muscle linkages , 2004, Biological Cybernetics.

[18]  Peter Birkholz,et al.  Control concepts for articulatory speech synthesis , 2007, SSW.

[19]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[20]  W. Strange,et al.  Dynamic specification of coarticulated vowels spoken in sentence context. , 1989, The Journal of the Acoustical Society of America.

[21]  Raymond D. Kent Research on speech motor control and its disorders: a review and prospective. , 2000, Journal of communication disorders.

[22]  F. Guenther,et al.  A theoretical investigation of reference frames for the planning of speech movements. , 1998, Psychological review.

[23]  Christian Abry,et al.  "Laws" for lips , 1986, Speech Commun..

[24]  Peter Birkholz,et al.  Simulation of Losses Due to Turbulence in the Time-Varying Vocal System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Philip N. Sabes,et al.  Multisensory Integration during Motor Planning , 2003, The Journal of Neuroscience.

[26]  L. Craighero,et al.  Electrophysiology of Action Representation , 2004, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[27]  Peter Birkholz,et al.  Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system , 2004, INTERSPEECH.

[28]  Louis Goldstein,et al.  Articulatory gestures as phonological units , 1989, Phonology.

[29]  B J Kröger A Gestural Production Model and Its Application to Reduction in German , 1993, Phonetica.

[30]  F. Guenther Cortical interactions underlying the production of speech sounds. , 2006, Journal of communication disorders.

[31]  Frank H. Guenther,et al.  Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models , 1997, Speech Commun..

[32]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[33]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[34]  Peter Birkholz,et al.  Spatial-to-joint coordinate mapping in a neural model of speech production , 2006 .

[35]  Peter Birkholz,et al.  3D-artikulatorische Sprachsynthese , 2005 .

[36]  Peter Birkholz,et al.  Modeling sensory-to-motor mappings using neural nets and a 3d articulatory speech synthesizer , 2006, INTERSPEECH.

[37]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[38]  I. Titze Parameterization of the glottal area, glottal flow, and vocal fold contact area. , 1984, The Journal of the Acoustical Society of America.

[39]  Satrajit S. Ghosh,et al.  Neural modeling and imaging of the cortical interactions underlying syllable production , 2006, Brain and Language.

[40]  Bert Cranen,et al.  Modeling a leaky glottis. , 1992 .

[41]  J. Dang,et al.  Morphological and acoustical analysis of the nasal and the paranasal cavities. , 1994, The Journal of the Acoustical Society of America.

[42]  B. Kröger,et al.  A gesture‐based dynamic model describing articulatory movement data , 1995 .