Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition

We introduce a robust classification method using Bayesian predictive distribution (Bayesian predictive classification, referred to as BPC) into speech recognition. We and others have recently proposed a total Bayesian framework for speech recognition, Variational Bayesian Estimation and Clustering for speech recognition (VBEC). VBEC includes an analytical derivation of approximate posterior distributions that are essential for BPC, based on variational Bayes (VB). BPC using VB posterior distributions (VB-BPC) can mitigate the over-training effects by marginalizing output distribution. We address the sparse data problem in speech recognition, and show how VB-BPC is robust against the data sparseness, experimentally.