An adaptive neural control scheme for articulatory synthesis of CV sequences

Reproducing the smooth vocal tract trajectories is critical for high quality articulatory speech synthesis. This paper presents an adaptive neural control scheme for such a task using fuzzy logic and neural networks. The control scheme estimates motor commands from trajectories of flesh-points on selected articulators. These motor commands are then used to reproduce the trajectories of the underlying articulators in a 2nd order dynamical system. Initial experiments show that the control scheme is able to manipulate the mass-spring based elastic tract walls in a 2-dimensional articulatory synthesizer and to realize efficient speech motor control. The proposed controller achieves high accuracy during on-line tracking of the lips, the tongue, and the jaw in the simulation of consonant-vowel sequences. It also offers salient features such as generality and adaptability for future developments of control models in articulatory synthesis.

[1]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[2]  Gérard Bailly,et al.  Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images , 2002, J. Phonetics.

[3]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[4]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[5]  Li-Xin Wang,et al.  A Course In Fuzzy Systems and Control , 1996 .

[6]  J. L. Flanagan,et al.  Synthesis of speech from a dynamic model of the vocal cords and vocal tract , 1975, The Bell System Technical Journal.

[7]  Meng Joo Er,et al.  An intelligent adaptive control scheme for postsurgical blood pressure regulation , 2005, IEEE Transactions on Neural Networks.

[8]  Peter Birkholz,et al.  Simulation of Losses Due to Turbulence in the Time-Varying Vocal System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  A. G. Feldman Once More on the Equilibrium-Point Hypothesis (λ Model) for Motor Control , 1986 .

[10]  Elliot Saltzman,et al.  Articulatory Information for Noise Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[12]  Hiroaki Gomi,et al.  Model-Based Investigation of Control and Dynamics in Human Articulatory Motion , 2007 .

[13]  W. L. Nelson Physical principles for economies of skilled movements , 1983, Biological Cybernetics.

[14]  Uri M. Ascher,et al.  Real-Time Numerical Solution of Webster's Equation on A Nonuniform Grid , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Peter Birkholz,et al.  Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system , 2004, INTERSPEECH.

[16]  Anders Löfqvist,et al.  Control of oral closure in lingual stop consonant production. , 2002, The Journal of the Acoustical Society of America.

[17]  B. Kröger,et al.  A gesture‐based dynamic model describing articulatory movement data , 1995 .

[18]  Yohan Payan,et al.  Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue , 2006, INTERSPEECH.

[19]  M. Er,et al.  Online adaptive fuzzy neural identification and control of a class of MIMO nonlinear systems , 2003, IEEE Trans. Fuzzy Syst..

[20]  D. Ostry,et al.  The equilibrium point hypothesis and its application to speech motor control. , 1996, Journal of speech and hearing research.

[21]  Chih-Min Lin,et al.  TSK Fuzzy CMAC-Based Robust Adaptive Backstepping Control for Uncertain Nonlinear Systems , 2012, IEEE Transactions on Fuzzy Systems.

[22]  Kohichi Ogata,et al.  Reproduction of articulatory behavior based on the parameterization of articulatory movements , 2003 .

[23]  Korin Richmond Preliminary inversion mapping results with a new EMA corpus , 2009, INTERSPEECH.

[24]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[25]  Keiichi Tokuda,et al.  Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[26]  Gérard Bailly,et al.  Learning to speak. Sensori-motor control of speech movements , 1997, Speech Commun..

[27]  P. Perrier,et al.  A biomechanical model of cardinal vowel production: muscle activations and the impact of gravity on tongue positioning. , 2009, The Journal of the Acoustical Society of America.

[28]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[29]  M. Kawato,et al.  Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion , 1990, Biological Cybernetics.

[30]  James L. Flanagan,et al.  Direct determination of vocal‐tract wall impedance , 1974 .

[31]  Bernd J. Kröger,et al.  Two- and three-dimensional visual articulatory models for pronunciation training and for treatment of speech disorders , 2008, INTERSPEECH.

[32]  J. Perkell,et al.  Influences of tongue biomechanics on speech movements during the production of velar stop consonants: a modeling study. , 2003, Journal of the Acoustical Society of America.

[33]  Eric Vatikiotis-Bateson,et al.  Inverse Dynamics of Speech Motor Control , 1993, NIPS.

[34]  Perry R. Cook,et al.  Identification Of Control Parameters In An Articulatory Vocal Tract Model, With Applications To The Synthesis Of Singing , 1990 .

[35]  Elliot Saltzman,et al.  The dynamical perspectives on speech production: Data and theory , 1986 .

[36]  Meng Joo Er,et al.  A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks , 2001, IEEE Trans. Fuzzy Syst..

[37]  Bernd J. Kröger,et al.  Towards a neurocomputational model of speech production and perception , 2009, Speech Commun..

[38]  Meng Joo Er,et al.  Robust adaptive control of robot manipulators using generalized fuzzy neural networks , 2003, IEEE Trans. Ind. Electron..

[39]  Peter Birkholz,et al.  Model-Based Reproduction of Articulatory Trajectories for Consonant–Vowel Sequences , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  A. G. Feldman Once more on the equilibrium-point hypothesis (lambda model) for motor control. , 1986, Journal of motor behavior.

[41]  Qiang Fang,et al.  A study on construction and control of a three-dimensional physiological articulatory model for speech production , 2009 .

[42]  Francis A. Duck,et al.  Physical properties of tissue : a comprehensive reference book , 1990 .