Generating small, accurate acoustic models with a modified Bayesian information criterion

Although Gaussian mixture models are commonly used in acoustic models for speech recognition, there is no standard method for determining the number of mixture components. Most models arbitrarily assign the number of mixture components with little justification. While model selection techniques with a mathematical derivation, such as the Bayesian information criterion (BIC), have been applied, these criteria focus on properly modeling the true distribution of individual tied-states (senones) without considering the entire acoustic model; this leads to suboptimal speech recognition performance. In this paper we present a method to generate statistically-justified acoustic models that consider inter-senone effects by modifying the BIC. Experimental results in the CMU Communicator domain show that in contrast to previous strategies, the new method generates not only attractively smaller acoustic models, but also ones with lower word error rate.

[1]  Richard M. Stern,et al.  The 1996 Hub-4 Sphinx-3 System , 1997 .

[2]  Alexander I. Rudnicky,et al.  The carnegie mellon communicator corpus , 2002, INTERSPEECH.

[3]  Man-Hung Siu,et al.  Pruning of state-tying tree using bayesian information criterion with multiple mixtures , 2000, INTERSPEECH.

[4]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Xiaolong Li,et al.  Acoustic model training using greedy EM , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Alain Biem,et al.  A model selection criterion for classification: application to HMM topology optimization , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  M. Padmanabhan,et al.  Model complexity adaptation using a discriminant measure , 2000, IEEE Trans. Speech Audio Process..