论文信息 - Variational approximation of long-span language models for lvcsr

Variational approximation of long-span language models for lvcsr

Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentence-hypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper that computationally tractable variational approximations of the long-span models are a better choice than standard n-gram models for first pass decoding. They not only result in a better first pass output, but also produce a lattice with a lower oracle word error rate, and rescoring the N-best list from such lattices with the long-span models requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone Recognition data sets are presented to support these claims.

Sanjeev Khudanpur | Martin Karafiát | Tomas Mikolov | Stefan Kombrink | Anoop Deoras

[1] Lukás Burget,et al. The AMI System for the Transcription of Speech in Meetings , 2007, ICASSP.

[2] Peng Xu,et al. Random forests and the data sparseness problem in language modeling , 2007, Comput. Speech Lang..

[3] Bhuvana Ramabhadran,et al. Scaling shrinkage-based language models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4] Mary P. Harper,et al. A Joint Language Model With Fine-grain Syntactic Tags , 2009, EMNLP.

[5] P. Bickel,et al. Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[6] Brian Roark,et al. Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[7] Mary P. Harper,et al. Model combination for Speech Recognition using Empirical Bayes Risk minimization , 2010, 2010 IEEE Spoken Language Technology Workshop.

[8] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[9] Lukás Burget,et al. The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[10] Brian Kingsbury,et al. The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[11] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[12] Friedrich Faubel,et al. Within and across sentence boundary language model , 2010, INTERSPEECH.

[13] Frederick Jelinek,et al. Structured language modeling , 2000, Comput. Speech Lang..

[14] James R. Glass,et al. Recent progress in the MIT spoken lecture processing project , 2007, INTERSPEECH.