A trajectory mixture density network for the acoustic-articulatory inversion mapping

This paper proposes a trajectory model which is based on a mixture density network trained with target features augmented with dynamic features together with an algorithm for estimating maximum likelihood trajectories which respects constraints between the static and derived dynamic features. This model was evaluated on an inversion mapping task. We found the introduction of the trajectory model successfully reduced root mean square error by up to 7.5%, as well as increasing correlation scores. Index Terms: acoustic-articulatory inversion, conditional trajectory model, mixture density network.

[1]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[2]  H. Wakita Estimation of vocal-tract shapes from acoustical analysis of the speech wave: The state of the art , 1979 .

[3]  Katsuhiko Shirai,et al.  Estimating articulatory motion from speech wave , 1986, Speech Commun..

[4]  W. Bastiaan Kleijn,et al.  Acoustic to articulatory parameter mapping using an assembly of neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Juergen Schroeter,et al.  Speech coding based on physiological models of speech production , 1992 .

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  V. Gracco,et al.  Accurate recovery of articulator positions from acoustics: new conclusions based on human data. , 1996, The Journal of the Acoustical Society of America.

[8]  Alan Wrench,et al.  Continuous speech recognition using articulatory data , 2000, INTERSPEECH.

[9]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Korin Richmond,et al.  Estimating articulatory parameters from the acoustic speech signal , 2002 .

[11]  Simon King,et al.  Modelling the uncertainty in recovering articulation from acoustics , 2003, Comput. Speech Lang..

[12]  Keiichi Tokuda,et al.  Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis , 2004, SSW.

[13]  Keiichi Tokuda,et al.  Acoustic-to-articulatory inversion mapping with Gaussian mixture model , 2004, INTERSPEECH.

[14]  Masaaki Honda,et al.  Estimation of articulatory movements from speech acoustics using an HMM-based speech production model , 2004, IEEE Transactions on Speech and Audio Processing.