Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.

[1]  John A. Carroll Relating Complexity to Practical Performance in Parsing With Wide-Coverage Unification Grammars , 1994, ACL.

[2]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[3]  John A. Carroll Practical unification-based parsing of Natural Language , 1993 .

[4]  Carl de Marcken,et al.  Parsing the LOB Corpus , 1990, ACL.

[5]  Donald Hindle,et al.  Acquiring Disambiguation Rules from Text , 1989, ACL.

[6]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[7]  Claire Grover,et al.  The derivation of a large computational lexicon for English from LDOCE , 1989 .

[8]  Ralph Grishman,et al.  Evaluating syntax performance of parser/grammars , 1991 .

[9]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[10]  Bernard Jones,et al.  Can Punctuation Help Parsing , 1994, COLING 1994.

[11]  Ted Briscoe,et al.  The Alvey natural language tools grammar (2nd Release) , 1989 .

[12]  Bernard Lang,et al.  The Structure of Shared Forests in Ambiguous Parsing , 1989, ACL.

[13]  Geoffrey Sampson,et al.  Natural language analysis by stochastic optimization: a progress report on Project APRIL , 1990, J. Exp. Theor. Artif. Intell..

[14]  Ted Briscoe,et al.  A Formalism and Environment for the Development of a Large Grammar of English , 1987, IJCAI.

[15]  David Elworthy,et al.  Does Baum-Welch Re-estimation Help Taggers? , 1994, ANLP.

[16]  David H. D. Warren,et al.  Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks , 1980, Artif. Intell..

[17]  Gregory P. Knowles,et al.  Manual of information to accompany the SEC corpus , 1988 .

[18]  Yves Schabes,et al.  Parsing the Wall Street Journal with the Inside-Outside Algorithm , 1993, EACL.

[19]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.