Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach

In this paper we describe a new technique for parsing free text: a transformational grammar1 is automatically learned that is capable of accurately parsing text into binary-branching syntactic trees with nonterminals unlabelled. The algorithm works by beginning in a very naive state of knowledge about phrase structure. By repeatedly comparing the results of bracketing in the current state to proper bracketing provided in the training corpus, the system learns a set of simple structural transformations that can be applied to reduce error. After describing the algorithm, we present results and compare these results to other recent results in automatic grammar induction.

[1]  J. Baker Trainable grammars for speech recognition , 1979 .

[2]  Geoffrey Sampson,et al.  A Stochastic Approach to Parsing , 1986, COLING.

[3]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[4]  R. A. Sharman,et al.  Generating a grammar for statistical training , 1990, HLT.

[5]  Mitchell P. Marcus,et al.  Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[6]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[7]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[8]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[9]  Glenn Carroll,et al.  Learn-ing probaballstic dependency grammars from labelled text , 1992 .

[10]  Eric Brill,et al.  Tagging an Unfamiliar Text With Minimal Human Supervision , 1992 .

[11]  Eric Brill,et al.  Automatically Acquiring Phrase Structure Using Distributional Analysis , 1992, HLT.

[12]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[15]  Yves Schabes,et al.  Parsing the Wall Street Journal with the Inside-Outside Algorithm , 1993, EACL.

[16]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.