Natural Language Grammar Induction Using a Constituent-Context Model

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.

[1]  Z. Harris,et al.  Methods in structural linguistics. , 1952 .

[2]  朗 太田 ZELLIG S. HARRIS, Methods in Structural Linguistics, 1951 , 1954 .

[3]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  J. Baker Trainable grammars for speech recognition , 1979 .

[6]  Noam Chomsky Knowledge of Language , 1986 .

[7]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[8]  Noam Chomsky Knowledge of language: its nature, origin, and use , 1988 .

[9]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[10]  F. Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.

[11]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[12]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[13]  Hinrich Schütze Distributional Part-of-Speech Tagging , 1995, EACL.

[14]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[15]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[16]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[17]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[18]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[19]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[20]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[21]  Menno van Zaanen,et al.  Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE , 2001 .

[22]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[23]  Nick Chater,et al.  Distributional Bootstrapping: From Word Class to Proto-Sentence , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.