Factor analyzed HMM topology for speech recognition

This paper presents a new factor analyzed (FA) similarity measure between two Gaussian mixture models (GMMs). An adaptive hidden Markov model (HMM) topology is built to compensate the pronunciation variations in speech recognition. Our idea aims to evaluate whether the variation of a HMM state from new speech data is significant or not and judge if a new state should be generated in the models. Due to the effectiveness of FA data analysis, we measure the GMM similarity by estimating the common factors and specific factors embedded in the HMM means and variances. Similar Gaussian densities are represented by the common factors. Specific factors express the residual of similarity measure. We perform a composite hypothesis test due to common factors as well as specific factors. An adaptive HMM topology is accordingly established from continuous collection of training utterances. Experiments show that the proposed FA measure outperforms other measures with comparable size of parameters.

[1]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[2]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[3]  Shen Furao,et al.  An incremental network for on-line unsupervised classification and topology learning , 2006, Neural Networks.

[4]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Jen-Tzung Chien,et al.  Adaptive HMM topology for speech recognition , 2008, INTERSPEECH.

[6]  Mark J. F. Gales,et al.  Factor analysed hidden Markov models for speech recognition , 2004, Comput. Speech Lang..

[7]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[8]  M. Srivastava Methods of Multivariate Statistics , 2002 .

[9]  Mei-Yuh Hwang,et al.  Dynamically configurable acoustic models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Jen-Tzung Chien,et al.  Factor Analyzed Subspace Modeling and Selection , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Jen-Tzung Chien,et al.  Acoustic Factor Analysis for Streamed Hidden Markov Modeling , 2009, IEEE Transactions on Audio, Speech, and Language Processing.