Training of subspace distribution clustering hidden Markov model

Levinson, Juang and Sondhi (1986), and Mak, Bocchieri, and E. Barnard (see Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 1997) presented novel subspace distribution clustering hidden Markov models (SDCHMMs) which can be converted from continuous density hidden Markov models (CDHMMs) by clustering subspace Gaussians in each stream over all models. Though such model conversion is simple and runs fast, it has two drawbacks: (1) it does not take advantage of the fewer model parameters in SDCHMMs-theoretically SDCHMMs may be trained with smaller amount of data; and, (2) it involves two separate optimization steps (first training CDHMMs, then clustering subspace Gaussians) and the resulting SDCHMMs are not guaranteed to be optimal. We show how SDCHMMs may be trained directly from less speech data if we have a priori knowledge of their architecture. On the ATIS task, a speaker-independent, context-independent (CI) 20-stream SDCHMM system trained using our novel SDCHMM reestimation algorithm with only 8 minutes of speech performs as well as a CDHMM system trained using conventional CDHMM reestimation algorithm with 105 minutes of speech.

[1]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[2]  Alexander I. Rudnicky,et al.  Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.

[3]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[4]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[5]  Etienne Barnard,et al.  Stream derivation and clustering schemes for subspace distribution clustering HMM , 1997 .

[6]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[8]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Etienne Barnard,et al.  Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Brian K Mak,et al.  Towards A Compact Speech Recognizer: Subspace Distribution ClusteringHidden Markov Model , 1998 .

[11]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[12]  Giuseppe Riccardi,et al.  State tying of triphone HMM's for the 1994 AT&t ARPA ATIS recognizer , 1995, EUROSPEECH.

[13]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[14]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering for continuous observation density hidden Markov models , 1997, EUROSPEECH.

[15]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[16]  Lalit R. Bahl,et al.  A discriminant measure for model complexity adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[18]  Giuseppe Riccardi,et al.  THE 1994 AT&T ATIS CHRONUS RECOGNIZER , 1994 .

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[21]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .