Natural language grammar induction with a generative constituent-context model

We present a generative probabilistic model for the unsupervised learning of hierarchical natural language syntactic structure. Unlike most previous work, we do not learn a context-free grammar, but rather induce a distributional model of constituents which explicitly relates constituent yields and their linear contexts. Parameter search with EM produces higher quality analyses for human language data than those previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable length show an even higher constituent F"1 of 71% on non-trivial brackets. We compare distributionally induced and actual part-of-speech tags as input data, and examine extensions to the basic model. We discuss errors made by the system, compare the system to previous models, and discuss upper bounds, lower bounds, and stability for this task.

[1]  Enrique Vidal,et al.  Identification of DFA: data-dependent vs data-independent algorithms , 1996, ICGI.

[2]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[3]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[4]  I. M. Schlesinger,et al.  Categories and Processes in Language Acquisition , 1990 .

[5]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[6]  Pat Langley A Model of Early Syntactic Development , 1982, ACL.

[7]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[8]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[9]  Zellig S. Harris,et al.  Methods in structural linguistics. , 1952 .

[10]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[11]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[12]  Pieter W. Adriaans,et al.  Grammar Induction as Substructural Inductive Logic Programming , 2001, Learning Language in Logic.

[13]  Steven Finch,et al.  Finding structure in language , 1995 .

[14]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[15]  Nianwen Xue,et al.  Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[16]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[17]  Steven Abney,et al.  The English Noun Phrase in its Sentential Aspect , 1972 .

[18]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[19]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[20]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[21]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[22]  Chris Mellish,et al.  Natural Language Processing in Pop-11: An Introduction to Computational Linguistics , 1989 .

[23]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[26]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[27]  Pat Langley,et al.  A Production System Model of First Language Acquisition , 1980, COLING.

[28]  Mallory Selfridge,et al.  A Computer Model of Child Language Acquisition , 1981, IJCAI.

[29]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[30]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[31]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[32]  Z. Harris,et al.  Methods in structural linguistics. , 1952 .

[33]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[34]  Mallory Selfridge A Computer Model of Child Language Learning , 1986, Artif. Intell..

[35]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[36]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[37]  J. Baker Trainable grammars for speech recognition , 1979 .

[38]  Menno van Zaanen,et al.  ABL: Alignment-Based Learning , 2000, COLING.

[39]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[40]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[41]  J. Gerard Wolff,et al.  Grammar Discovery as Data Compression , 1978, AISB/GI.

[42]  M.McGee Wood,et al.  Natural language processing in LISP , 1990 .