论文信息 - Bayesian Context Clustering Using Cross Validation for Speech Recognition - 字舞流文

Bayesian Context Clustering Using Cross Validation for Speech Recognition

This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.

Heiga Zen | Yoshihiko Nankaku | Keiichi Tokuda | Akinobu Lee | Kei Hashimoto | H. Zen | Kei Hashimoto | K. Tokuda | Yoshihiko Nankaku | Akinobu Lee

[1] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[2] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .

[3] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[5] Kai-Fu Lee,et al. Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[6] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[7] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[8] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[9] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[10] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[11] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[12] Shuichi Itahashi,et al. JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research , 1999 .

[13] Keikichi Hirose,et al. Robust speech recognition based on a Bayesian prediction approach , 1999, IEEE Trans. Speech Audio Process..

[14] Hagai Attias,et al. Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[15] Chin-Hui Lee,et al. A Bayesian predictive classification approach to robust speech recognition , 2000, IEEE Trans. Speech Audio Process..

[16] Naonori Ueda,et al. Variational bayesian estimation and clustering for speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[17] Shinji Watanabe,et al. Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition , 2005, INTERSPEECH.

[18] Takahiro Shinozaki. Hmm State Clustering Based on Efficient Cross-Validation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19] Shinji Watanabe,et al. Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20] Heiga Zen,et al. Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition , 2008, INTERSPEECH.

[21] B. Juang,et al. Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[22] Sylvain Arlot,et al. A survey of cross-validation procedures for model selection , 2009, 0907.4728.