Maximum Conditional Mutual Information Weighted Scoring for Speech Recognition

This paper describes a novel approach for extending the prototype Gaussian mixture model used in representing different classes in many recognition or classification systems and its application to large vocabulary automatic speech recognition (ASR). This is achieved by estimating weighting vectors to the log likelihood values due to different elements in the feature vector. This approach estimates the weighting vectors which maximize an estimate of the conditional mutual information between the log likelihood score and a binary random variable representing whether the log likelihood is estimated using the model of the correct label or not. It is shown in the paper that under some assumptions on the conditional probability density function (PDF) of the log likelihood score given this random variable, maximizing the differential entropy of a normalized log likelihood score is an equivalent criterion. This approach allows emphasizing different features, in the acoustic feature vector used in the system, for different hidden Markov model (HMM) states. In this paper, we apply this approach to the RT04 Arabic broadcast news speech recognition task. Compared to the baseline system, 3% relative improvement in the word error rate (WER) is obtained

[1]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[2]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[3]  Mark J. F. Gales Maximum likelihood multiple subspace projections for hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[4]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[6]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[7]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[11]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .