A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling

Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which offline adaptation technique can be applied to generate pseudo-truth transcription as an approximation to manual transcription. Viewing manual and pseudo-truth transcriptions as two domains, we perform domain adaptation on the discriminative language models via hierarchical Bayesian, in which the domainspecific models share a common prior model. Domainspecific and prior models are then estimated jointly using training data. On the N-best list rescoring experiment, hierarchical Bayesian has yielded better recognition performance than the model trained only on manual transcription, and is robust against inferior prior.

[1]  Chin-Hui Lee,et al.  Discriminative training of language models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mark Dredze,et al.  Adapting n-gram maximum entropy language models with conditional entropy regularization , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Mingjing Li,et al.  Discriminative training on language model , 2000, INTERSPEECH.

[4]  Sanjeev Khudanpur,et al.  Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets , 2010, COLING.

[5]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[6]  Brian Roark,et al.  Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.

[7]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Sanjeev Khudanpur,et al.  Self-supervised discriminative training of statistical language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Mikko Kurimo,et al.  Domain Adaptation of Maximum Entropy Language Models , 2010, ACL.

[11]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[12]  Jun'ichi Tsujii,et al.  A discriminative language model with pseudo-negative samples , 2007, ACL.