BAYESIAN ACOUSTIC MODELING FOR SPONTANEOUS SPEECH RECOGNITION

Accurate acoustic model construction for spontaneous speech recognition requires that various speech fluctuation factors such as speaking variations and speaker variances are dealt with. The Bayesian approach has advantages for the speech fluctuation modeling because it enables an appropriate model selection for given speech data, unlike the maximum likelihood approach. However, the Bayesian approach includes complicated integrals that have prevented it from being realized in a large-scale task such as spontaneous speech recognition. In this paper, we apply a practical Bayesian framework: Variational Bayesian Estimation and Clustering for speech recognition (VBEC), to spontaneous speech recognition. In particular, we focus on the selection of an appropriate acoustic model structure. The effectiveness of the VBEC is shown through recognition experiments using real spontaneous speech data.