A Generative Constituent-Context Model for Improved Grammar Induction

We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published un-supervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher F1 of 71% on non-trivial brackets. We compare distributionally induced and actual part-of-speech tags as input data, and examine extensions to the basic model. We discuss errors made by the system, compare the system to previous models, and discuss upper bounds, lower bounds, and stability for this task.

[1]  Steven Abney,et al.  The English Noun Phrase in its Sentential Aspect , 1972 .

[2]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[3]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[4]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[5]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[6]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[7]  Menno van Zaanen,et al.  ABL: Alignment-Based Learning , 2000, COLING.

[8]  Pieter W. Adriaans,et al.  Grammar Induction as Substructural Inductive Logic Programming , 2001, Learning Language in Logic.

[9]  Steven Finch,et al.  Finding structure in language , 1995 .

[10]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[11]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[12]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[13]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[14]  Hinrich Schütze,et al.  Part-of-Speech Induction From Scratch , 1993, ACL.

[15]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[16]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[17]  J. Baker Trainable grammars for speech recognition , 1979 .

[18]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[19]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[20]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.