A discriminant measure for model complexity adaptation

We present a discriminant measure that can be used to determine the model complexity in a speech recognition system. In the speech recognition process, given a test feature vector the conditional probability of the feature vector has to be obtained for several allophone (sub-phonetic units) classes using a Gaussian-mixture density model for each class. The Gaussian-mixture models are constructed from the training data belonging to the allophone classes, and the number of mixture components that are required to adequately model the PDF of each class is determined by using some simple rule of thumb-for instance the number of components has to be sufficient to model the data reasonably well but not so many as to overmodel the data. A typical example of the choice of the number is to make it proportional to the number of data samples. However, such methods may result in models that are sub-optimal as far as classification accuracy is concerned. We present a new discriminant measure that can be used to determine in an objective fashion, the number of Gaussians required to best model the PDF of an allophone class. We also present the results of experiments showing the improvement in recognition performance when the number of mixture components is chosen based on the discriminant measure as opposed to the rule of thumb. These results are presented both for the speaker-independent and speaker-adapted case.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[4]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[5]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[6]  Yves Normandin Optimal splitting of HMM Gaussian mixture components with MMIE training , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[8]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.