论文信息 - Corpus-Based Induction of Syntactic Structure : Models of Constituency and Dependency

Corpus-Based Induction of Syntactic Structure : Models of Constituency and Dependency

The task of statistically inducing hierarchical syntactic structure over unannotated sentences of natural language has received a great deal of attention (Carroll and Charniak, 1992a; Pereira and Schabes, 1992; Brill, 1993; Stolcke and Omohundro, 1994). Researchers have explored this problem for a variety of reasons: to argue empirically against the poverty of the stimulus (Clark, 2001), to use induction systems as a first stage in constructing large treebanks (van Zaanen, 2000), to build better language models (Baker, 1979; Chen, 1995), and to examine psychological issues in language learning (Solan et al., 2003). An important distinction should be drawn between work primarily interested in the weak generative capacity of models, where modeling hierarchical structure is only useful insofar as it leads to improved models over observed structures (Baker, 1979; Chen, 1995), and work interested in the strong generative capacity of models, where the unobserved structure itself is evaluated (van Zaanen, 2000; Clark, 2001; Klein and Manning, 2002). This paper falls into the latter category; we will be inducing models of linguistic constituency and dependency with the goal of recovering linguistically plausible structures. We make no claims as to the congitive plausibility of the induction mechanisms we present here, however the ability of these systems to recover substantial linguistic patterns from surface yields alone does speak to the strength of support for these patterns in the data, and hence to undermine arguments based on “the poverty of the stimulus” (Chomsky, 1965). 2 Distributional Syntax Induction

Christopher D. Manning | D. Klein

[1] Zellig S. Harris,et al. Methods in structural linguistics. , 1952 .

[2] Noam Chomsky,et al. वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[3] J. Baker. Trainable grammars for speech recognition , 1979 .

[4] Pat Langley,et al. A Production System Model of First Language Acquisition , 1980, COLING.

[5] J. Wolff. Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[6] I. M. Schlesinger,et al. Categories and Processes in Language Acquisition , 1990 .

[7] Fernando Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[8] Glenn Carroll,et al. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[9] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10] Eric Brill,et al. Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[11] Andreas Stolcke,et al. Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.