Best-first Word-lattice Parsing: Techniques for integrated syntactic language modeling

This thesis explores a language modeling technique based on statistical parsing. Previous research that exploits syntactic structure for modeling language has shown improved accuracy over the standard trigram models. Unlike previous techniques, our parsing model performs syntactic analysis on sets of hypothesized word-strings simultaneously; these sets are encoded as weighted finite state automata word-lattices. We present a best-first word-lattice chart parsing algorithm which combines the search for good parses with the search for good strings in the word-lattice. We describe how the word-lattice parser is combined with the Charniak language model, a sophisticated syntactic language model, in order to provide an efficient syntactic language model. We present results for this model on a standard set of speech recognition word-lattices. Finally, we examine variations of the word-lattice parser in order to increase performance as well as accuracy.

[1]  John D. Lafferty,et al.  Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars , 1991, Comput. Linguistics.

[2]  Brian Roark,et al.  Markov Parsing: Lattice Rescoring with a Statistical Parser , 2002, ACL.

[3]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[4]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[5]  Eugene Charniak,et al.  Automatic Compensation for Parser Figure-of-Merit Flaws , 1999, ACL.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Brian Roark,et al.  Corrective language modeling for large vocabulary ASR with the perceptron algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Mehryar Mohri,et al.  A Rational Design for a Weighted Finite-State Transducer Library , 1997, Workshop on Implementing Automata.

[11]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[12]  Ralph Grishman,et al.  Evaluating syntax performance of parser/grammars , 1991 .

[13]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[14]  Mehryar Mohri,et al.  The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[15]  Mehryar Mohri,et al.  A weight pushing algorithm for large vocabulary speech recognition , 2001, INTERSPEECH.

[16]  Mark-Jan Nederhof,et al.  Efficient and Robust Parsing of Word Hypotheses Graphs , 2000 .

[17]  Steve Young,et al.  The HTK book , 1995 .

[18]  Mark Johnson,et al.  Mathematical Foundations of Speech and Language Processing , 2004 .

[19]  Zhiyi Chi,et al.  Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[20]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[21]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[22]  Mark Johnson,et al.  Attention Shifting for Parsing Speech , 2004, ACL.

[23]  David Goddeau,et al.  Using probabilistic shift-reduce parsing in speech recognition systems , 1992, ICSLP.

[24]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[25]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[26]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[27]  Dan Klein,et al.  Factored A* Search for Models over Sequences and Trees , 2003, IJCAI.

[28]  Hans Ingo Weber Time Synchronous Chart Parsing of Speech Integrating Unification Grammars with Statistics , 1994 .

[29]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[30]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[31]  Mark Johnson,et al.  Language modeling using efficient best-first bottom-up parsing , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[32]  Zellig S. Harris,et al.  Methods in structural linguistics. , 1952 .

[33]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[34]  Bob Carpenter,et al.  Head-Driven Parsing for Word Lattices , 2004, ACL.

[35]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[36]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[37]  Martin Rajman,et al.  Lattice Parsing for Speech Recognition , 1999 .

[38]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[39]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[40]  Mark Johnson,et al.  Probability and statistics in computational linguistics, a brief review , 2004 .

[41]  Noam Chomsky,et al.  Remarks on Nominalization , 2020, Nominalization.

[42]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[43]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[44]  Mark Johnson,et al.  Robust probabilistic predictive syntactic processing: motivations, models, and applications , 2001 .

[45]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[46]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[47]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[48]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[49]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[50]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[51]  Ciprian Chelba,et al.  Exploiting Syntactic Structure for Natural Language Modeling , 2000, ArXiv.

[52]  Victor W. Zue,et al.  Integrating probabilistic LR parsing into speech understanding systems , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  J. Baker Trainable grammars for speech recognition , 1979 .

[54]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[55]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.