Model-space MLLR for trajectory HMMs

This paper proposes model-space Maximum Likelihood Linear Regression (mMLLR) based speaker adaptation technique for trajectory HMMs, which have been derived from HMMs by imposing explicit relationships between static and dynamic features. This model can alleviate two limitations of the HMM: constant statistics within a state and conditional independence assumption of state output probabilities without increasing the number of model parameters. Results in a continuous speech recognition experiments show that the proposed algorithm can adapt trajectory HMMs to a specific speaker and improve the performance of a trajectory HMM-based speech recognition system.

[1]  Junichi Yamagishi,et al.  Average-Voice-Based Speech Synthesis , 2006 .

[2]  Kuldip K. Paliwal,et al.  Use of temporal correlation between successive frames in a hidden Markov model based speech recognizer , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[4]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[5]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[6]  Heiga Zen,et al.  Speaker adaptation of trajectory HMMs using feature-space MLLR , 2006, INTERSPEECH.

[7]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[8]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Mark J. F. Gales,et al.  Segmental hidden Markov models , 1993, EUROSPEECH.

[10]  Heiga Zen,et al.  Estimating Trajectory Hmm Parameters Using Monte Carlo Em With Gibbs Sampler , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[12]  Mark J. F. Gales,et al.  Switching linear dynamical systems for speech recognition , 2003 .

[13]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[14]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[15]  Philip C. Woodland Speaker adaptation for continuous density HMMs: a review , 2001 .

[16]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[17]  Ritu Sharma Speech Synthesis , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[18]  全 炳河,et al.  Reformulating HMM as a trajectory model by imposing explicit relationships between static and dynamic features , 2006 .

[19]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.