Learning to speak. Sensori-motor control of speech movements

Abstract This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four major steps: (a) a babbling phase, where the device builds up a model of the forward transforms i.e. the articulatory-to-audio-visual mapping; (b) an imitation stage, where it tries to reproduce a limited set of sound sequences by audio-visual-to-articulatory inversion; (c) a “shaping” stage, where phonemes are associated with the most efficient available sensori-motor representation; and finally, (d) a “rhythmic” phase, where it learns the appropriate coordination of the activations of these sensori-motor targets.

[1]  Patricia K. Kuhl,et al.  The special-mechanisms debate in speech research: Categorization tests on animals and infants. , 1987 .

[2]  H. Sussman,et al.  An investigation of locus equations as a source of relational invariance for stop place categorization , 1991 .

[3]  Pietro G. Morasso,et al.  Self-Organization, Computational Maps, and Motor Control , 1997 .

[4]  D. Ostry,et al.  The equilibrium point hypothesis and its application to speech motor control. , 1996, Journal of speech and hearing research.

[5]  F H Guenther,et al.  Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. , 1995, Psychological review.

[6]  Thomas Baer,et al.  An articulatory synthesizer for perceptual research , 1978 .

[7]  Jacques Mehler,et al.  How do 4-day-old infants categorize multisyllabic utterances? , 1993 .

[8]  Gérard Bailly,et al.  Formant trajectories as audible gestures: An alternative for speech synthesis , 1991 .

[9]  G. Bailly,et al.  Articulatory synthesis of fricative consonants : Data and models , 1996 .

[10]  P. MacNeilage,et al.  The articulatory basis of babbling. , 1995, Journal of speech and hearing research.

[11]  Silvia Pfleger,et al.  Advanced Speech Applications , 1994 .

[12]  S. Harnad Categorical Perception: The Groundwork of Cognition , 1990 .

[13]  R. Smits,et al.  Evaluation of various sets of acoustic cues for the perception of prevocalic stop consonants. I. Perception experiment. , 1996, The Journal of the Acoustical Society of America.

[14]  P. Perrier,et al.  How could undershot vowel targets be recovered ? A dynamical approach based on the equilibrium point hypothesis for the control of speech movements , 1996 .

[15]  Marco Saerens,et al.  A comparison of different acoustic and articulatory representations for the determination of place of articulation of plosives , 1994, ICSLP.

[16]  B. Lindblom,et al.  Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast , 1972 .

[17]  Yohan Payan,et al.  A control model of human tongue movements in speech , 1997, Biological Cybernetics.

[18]  Gunnar Fant,et al.  Vocal tract area functions of Swedish vowels and a new three-parameter model , 1992, ICSLP.

[19]  P F MacNeilage,et al.  Organization Of Babbling: A Case Study , 1994, Language and speech.

[20]  R. Smits,et al.  Evaluation of various sets of acoustic cues for the perception of prevocalic stop consonants. II. Modeling and evaluation. , 1996, The Journal of the Acoustical Society of America.

[21]  Michael I. Jordan,et al.  Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: a pilot "motor equivalence" study. , 1993, The Journal of the Acoustical Society of America.

[22]  Christian Abry,et al.  Sound-to-gesture inversion in speech : The Speech Maps approach , 1995 .

[23]  Gérard Bailly,et al.  Building prototypes for articulatory speech synthesis , 1994, SSW.

[24]  S. Ohman Numerical model of coarticulation. , 1967, The Journal of the Acoustical Society of America.

[25]  James Lubker,et al.  Formant frequencies of some fixed-mandible vowels and a model of speech motor programming by predict , 1977 .

[26]  Pascal Perrier,et al.  Compensation strategies for the perturbation of the rounded vowel [u] using a lip-tube : A study of the control space in speech production , 1995 .

[27]  Gérard Bailly,et al.  EVALUATION OF AN ARTICULATORY-ACOUSTIC MODEL BASED ON A REFERENCE SUBJECT , 1996 .

[28]  William H. Ham An acoustic and perceptual study of Swiss German stops , 1997 .

[29]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[30]  Roel Smits,et al.  Context-dependent relevance of burst and transitions for perceived place in stops: it's in production, not perception , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  K. Markey The sensorimotor foundations of phonology: a computational model of early childhood articulatory and phonetic development , 1995 .

[32]  Yohan Payan,et al.  On the biomechanical control variables of the tongue during speech movements , 1995, EUROSPEECH.

[33]  S. Öhman Numerical Model of Coarticulation , 1967 .

[34]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[35]  Pascal Perrier,et al.  Vocalic reduction : prediction of acoustic and articulatory variabilities with invariant motor commands , 1993, EUROSPEECH.

[36]  Jean-Luc Schwartz,et al.  The prediction of vowel systems: perceptual contrast and stability , 1995 .

[37]  Frank H. Guenther,et al.  A MODELING FRAMEWORK FOR SPEECH MOTOR DEVELOPMENT AND KINEMATIC ARTICULATOR CONTROL , 1995 .

[38]  R. Wilhelms-Tricarico Physiological modeling of speech production: methods for modeling soft-tissue articulators. , 1995, The Journal of the Acoustical Society of America.

[39]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[40]  Mary E. Beckman,et al.  Jaw targets for strident fricatives , 1994, ICSLP.

[41]  Pietro Morasso,et al.  Cortical Maps of Sensorimotor Spaces , 1997 .

[42]  G. Bailly,et al.  Characterising formant trajectories by tracking vocal tract resonances , 1996 .

[43]  David R. Williams,et al.  Categorical trends in vowel imitation: Preliminary observations from a replication experiment , 1985, Speech Commun..

[44]  D. Whalen Coarticulation is largely planned , 1990 .

[45]  M. Fourakis,et al.  Tempo, stress, and vowel reduction in American English. , 1991, The Journal of the Acoustical Society of America.