Accurate acoustic model construction for spontaneous speech recognition requires that various speech fluctuation factors such as speaking variations and speaker variances are dealt with. The Bayesian approach has advantages for the speech fluctuation modeling because it enables an appropriate model selection for given speech data, unlike the maximum likelihood approach. However, the Bayesian approach includes complicated integrals that have prevented it from being realized in a large-scale task such as spontaneous speech recognition. In this paper, we apply a practical Bayesian framework: Variational Bayesian Estimation and Clustering for speech recognition (VBEC), to spontaneous speech recognition. In particular, we focus on the selection of an appropriate acoustic model structure. The effectiveness of the VBEC is shown through recognition experiments using real spontaneous speech data.
[1]
Naonori Ueda,et al.
Application of Variational Bayesian Approach to Speech Recognition
,
2002,
NIPS.
[2]
Naonori Ueda,et al.
Bayesian model search for mixture models based on optimizing variational bounds
,
2002,
Neural Networks.
[3]
Hitoshi Isahara,et al.
Toward the realization of spontaneous speech recognition - introduction of a Japanese priority program and preliminary results -
,
2000,
INTERSPEECH.
[4]
Hagai Attias,et al.
Inferring Parameters and Structure of Latent Variable Models by Variational Bayes
,
1999,
UAI.
[5]
Jj Odell,et al.
The Use of Context in Large Vocabulary Speech Recognition
,
1995
.