Canonical state models for automatic speech recognition

Current speech recognition systems are often based on HMMs with state-clustered Gaussian Mixture Models (GMMs) to represent the context dependent output distributions. Though highly successful, the standard form of model does not exploit any relationships between the states, they each have separate model parameters. This paper describes a general class of model where the context-dependent state parameters are a transformed version of one, or more, canonical states. A number of published models sit within this framework, including, semi-continuous HMMs, subspace GMMs and the HMM error model. A set of preliminary experiments illustrating some of this model’s properties using CMLLR transformations from the canonical state to the context dependent state are described.

[1]  Brian Kingsbury,et al.  The IBM 2008 GALE Arabic speech transcription system , 2010, ICASSP.

[2]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3]  Mark J. F. Gales Transformation streams and the HMM error model , 2002, Comput. Speech Lang..

[4]  Scott Axelrod,et al.  Modeling with a subspace constraint on inverse covariance matrices , 2002, INTERSPEECH.

[5]  Vassilios Diakoloukas,et al.  Maximum-likelihood stochastic-transformation adaptation of hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[6]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[7]  Mark J. F. Gales,et al.  Progress in the CU-HTK broadcast news transcription system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[9]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Kai Feng,et al.  SUBSPACE GAUSSIAN MIXTURE MODELS FOR SPEECH RECOGNITION , 2009 .

[11]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[12]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[13]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Brian Kan-Wing Mak,et al.  Training of subspace distribution clustering hidden Markov model , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).