Language modeling using efficient best-first bottom-up parsing

In this paper we present a two-stage best-first bottom-up word-lattice parser which we use as a language model for speech recognition. The parser works by using a "figure of merit" that selects lattice paths while simultaneously selecting syntactic category edges for parsing. Additionally, we introduce a modified version of the inside-outside algorithm used as a pruning stage between syntactic context-free parsing and lexicalized context-dependent parsing. We report our results in terms of word error rate on the HUB-1 word-lattices and compare these results to other syntactic language modeling techniques.

[1]  Hans Ingo Weber Time Synchronous Chart Parsing of Speech Integrating Unification Grammars with Statistics , 1994 .

[2]  Brian Roark,et al.  Robust Probabilistic Predictive Syntactic Processing , 2001, ArXiv.

[3]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[4]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[5]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[6]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[7]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[8]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[9]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[10]  Mark Johnson,et al.  Robust probabilistic predictive syntactic processing: motivations, models, and applications , 2001 .

[11]  Martin Rajman,et al.  Lattice Parsing for Speech Recognition , 1999 .

[12]  Eugene Charniak,et al.  Automatic Compensation for Parser Figure-of-Merit Flaws , 1999, ACL.

[13]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[14]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[15]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.