Variational approximation of long-span language models for lvcsr

Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentence-hypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper that computationally tractable variational approximations of the long-span models are a better choice than standard n-gram models for first pass decoding. They not only result in a better first pass output, but also produce a lattice with a lower oracle word error rate, and rescoring the N-best list from such lattices with the long-span models requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone Recognition data sets are presented to support these claims.

[1]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, ICASSP.

[2]  Peng Xu,et al.  Random forests and the data sparseness problem in language modeling , 2007, Comput. Speech Lang..

[3]  Bhuvana Ramabhadran,et al.  Scaling shrinkage-based language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Mary P. Harper,et al.  A Joint Language Model With Fine-grain Syntactic Tags , 2009, EMNLP.

[5]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[6]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[7]  Mary P. Harper,et al.  Model combination for Speech Recognition using Empirical Bayes Risk minimization , 2010, 2010 IEEE Spoken Language Technology Workshop.

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[10]  Brian Kingsbury,et al.  The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  Friedrich Faubel,et al.  Within and across sentence boundary language model , 2010, INTERSPEECH.

[13]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[14]  James R. Glass,et al.  Recent progress in the MIT spoken lecture processing project , 2007, INTERSPEECH.