Acoustic-Articulatory Modelling with the Trajectory HMM

In this letter, we introduce an HMM-based inversion system to recovery articulatory movements from speech acou stics. Trajectory HMMs are used as generative models for modelling articulatory data. Experiments on the MOCHA-TIMIT corpus indicate that the jointly trained acoustic-articulatory m odels are more accurate (lower RMS error) than the separately trained ones, and that trajectory HMM training results in greater accuracy compared with conventional maximum likelihood HMM training. Moreover, the system has the ability to synthesis e articulatory movements directly from a textual representation.

[1]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[2]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  Tetsunori Kobayashi,et al.  Application of neural networks to articulatory motion estimation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  W. Bastiaan Kleijn,et al.  Acoustic to articulatory parameter mapping using an assembly of neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  G Papcun,et al.  Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. , 1992, The Journal of the Acoustical Society of America.

[6]  John S. D. Mason,et al.  Deriving articulatory representations from speech with various excitation modes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Simon King,et al.  Dynamical system modelling of articulator movement. , 1999 .

[8]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  C S Blackburn,et al.  A self-learning predictive model of articulator movements during speech production. , 2000, The Journal of the Acoustical Society of America.

[10]  Korin Richmond,et al.  Estimating articulatory parameters from the acoustic speech signal , 2002 .

[11]  Simon King,et al.  Modelling the uncertainty in recovering articulation from acoustics , 2003, Comput. Speech Lang..

[12]  Keiichi Tokuda,et al.  Acoustic-to-articulatory inversion mapping with Gaussian mixture model , 2004, INTERSPEECH.

[13]  John Scott Bridle,et al.  Towards better understanding of the model implied by the use of dynamic features in HMMs , 2004, INTERSPEECH.

[14]  Heiga Zen,et al.  An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005 , 2005, INTERSPEECH.

[15]  Miguel Á. Carreira-Perpiñán,et al.  An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping , 2007, INTERSPEECH.

[16]  Korin Richmond,et al.  Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion , 2007, NOLISP.

[17]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[18]  Simon King,et al.  Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..