Incremental Tree Substitution Grammar for Parsing and Sentence Prediction

In this paper, we present the first incremental parser for Tree Substitution Grammar (TSG). A TSG allows arbitrarily large syntactic fragments to be combined into complete trees; we show how constraints (including lexicalization) can be imposed on the shape of the TSG fragments to enable incremental processing. We propose an efficient Earley-based algorithm for incremental TSG parsing and report an F-score competitive with other incremental parsers. In addition to whole-sentence F-score, we also evaluate the partial trees that the parser constructs for sentence prefixes; partial trees play an important role in incremental interpretation, language modeling, and psycholinguistics. Unlike existing parsers, our incremental TSG parser can generate partial trees that include predictions about the upcoming words in a sentence. We show that it outperforms an n-gram model in predicting more than one upcoming word.

[1]  John Eng,et al.  Informatics in Radiology (infoRAD): radiology report entry with automatic phrase completion driven by language modeling. , 2004, Radiographics : a review publication of the Radiological Society of North America, Inc.

[2]  Tobias Scheffer,et al.  Sentence Completion , 1921, SIGIR '04.

[3]  Lei Zheng,et al.  A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation , 2011, ACL.

[4]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[5]  Peter Haider,et al.  Predicting Sentences using N-Gram Language Models , 2005, HLT.

[6]  Graeme Hirst,et al.  Testing the Efficacy of Part-of-Speech Information in Word Completion , 2003 .

[7]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[8]  Joel R. Tetreault,et al.  Incremental Parsing with Reference Interaction , 2004 .

[9]  Tomi Kinnunen,et al.  INTERSPEECH 2013 14thAnnual Conference of the International Speech Communication Association , 2013, Interspeech 2015.

[10]  William Schuler,et al.  Broad-Coverage Parsing Using Human-Like Memory Constraints , 2010, CL.

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[13]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[14]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[15]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[16]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[17]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[18]  Aravind K. Joshi,et al.  Mathematical and computational aspects of lexicalized grammars , 1990 .

[19]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[20]  N. Calzolari,et al.  Efficiently Extract Rrecurring Tree Fragments from Large Treebanks , 2010, LREC.

[21]  Brian Roark,et al.  Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing , 2009, EMNLP.

[22]  Frank Keller,et al.  Incremental, Predictive Parsing with Psycholinguistically Motivated Tree-Adjoining Grammar , 2013, CL.

[23]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[24]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[25]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[26]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[27]  Kenji Sagae,et al.  Dynamic Programming for Linear-Time Incremental Parsing , 2010, ACL.

[28]  Joakim Nivre Incremental Non-Projective Dependency Parsing , 2007, HLT-NAACL.

[29]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[30]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[31]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[32]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[33]  EarleyJay An efficient context-free parsing algorithm , 1970 .

[34]  Chris Callison-Burch,et al.  Incremental Syntactic Language Models for Phrase-based Translation , 2011, ACL.