Subspace distribution clustering hidden Markov model

Most contemporary laboratory recognizers require too much memory to run, and are too slow for mass applications. One major cause of the problem is the large parameter space of their acoustic models. In this paper, we propose a new acoustic modeling methodology which we call subspace distribution clustering hidden Markov modeling (SDCHMM) with the aim of achieving much more compact acoustic models. The theory of SDCHMM is based on tying the parameters of a new unit, namely the subspace distribution, of continuous density hidden Markov models (CDHMMs). SDCHMMs can be converted from CDHMMs by projecting the distributions of the CDHMMs onto orthogonal subspaces, and then tying similar subspace distributions over all states and all acoustic models in each subspace, by exploiting the combinatorial effect of subspace distribution encoding, all original full-space distributions can be represented by combinations of a small number of subspace distribution prototypes. Consequently, there is a great reduction in the number of model parameters, and thus substantial savings in memory and computation. This renders SDCHMM very attractive in the practical implementation of acoustic models. Evaluation on the Airline Travel Information System (ATIS) task shows that in comparison to its parent CDHMM system, a converted SDCHMM system achieves seven- to 18-fold reduction in memory requirement for acoustic models, and runs 30%-60% faster without any loss of recognition accuracy.

[1]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2]  Frank Seide,et al.  Fast likelihood computation for continuous-mixture densities using a tree-based nearest neighbor search , 1995, EUROSPEECH.

[3]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[5]  Alexander I. Rudnicky,et al.  Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.

[6]  Hsiao-Wuen Hon,et al.  Allophone clustering for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Andreas Spanias,et al.  High-performance alphabet recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[9]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[10]  Lalit R. Bahl,et al.  Decision-tree based quantization of the feature space of a speech recognizer , 1997, EUROSPEECH.

[11]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[12]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  Hiroki Yamamoto,et al.  An efficient output probability computation for continuous HMM using rough and detail models , 1995, EUROSPEECH.

[15]  A. Gray,et al.  Distortion performance of vector quantization for LPC voice coding , 1982 .

[16]  Giuseppe Riccardi,et al.  State tying of triphone HMM's for the 1994 AT&t ARPA ATIS recognizer , 1995, EUROSPEECH.

[17]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[18]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Etienne Barnard,et al.  Phone clustering using the Bhattacharyya distance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20]  Shawn M. Herman,et al.  Joint MCE estimation of VQ and HMM parameters for Gaussian mixture selection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Yunxin Zhao A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units , 1993, IEEE Trans. Speech Audio Process..

[22]  Mark J. F. Gales,et al.  State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs , 1999, IEEE Trans. Speech Audio Process..

[23]  Etienne Barnard,et al.  Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[24]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Peter Beyerlein,et al.  Hamming distance approximation for a fast log-likelihood computation for mixture densities , 1995, EUROSPEECH.

[26]  Giuseppe Riccardi,et al.  THE 1994 AT&T ATIS CHRONUS RECOGNIZER , 1994 .

[27]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[28]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  P. S. Gopalakrishnan,et al.  Fast Match Techniques , 1996 .

[31]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[32]  Cheung-Fat Chan,et al.  Split-dimension vector quantization of Parcor coefficients for low bit rate speech coding , 1994, IEEE Trans. Speech Audio Process..

[33]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[34]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[35]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[36]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[37]  Vassilios Digalakis,et al.  Efficient speech recognition using subvector quantization and discrete-mixture HMMS , 2000, Comput. Speech Lang..

[38]  Tetsuo Kosaka,et al.  Tree-structured speaker clustering for fast speaker adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering for continuous observation density hidden Markov models , 1997, EUROSPEECH.

[40]  Sam Kash Kachigan Multivariate statistical analysis: A conceptual introduction , 1982 .

[41]  Lalit R. Bahl,et al.  A discriminant measure for model complexity adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[42]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .