论文信息 - Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features - 字舞流文

Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features

We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes.

Ari Rappoport | Roi Reichart | A. Rappoport | Roi Reichart

[1] Katsuhiko Nakamura. Incremental Learning of Context Free Grammars by Bridging Rule Generation and Search for Semi-optimum Rule Sets , 2006, ICGI.

[2] W. Bruce Croft. Radical Construction Grammar , 2001 .

[3] Rens Bod,et al. Unsupervised Parsing with U-DOP , 2006, CoNLL.

[4] Alexander Clark,et al. Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[5] Dan Klein,et al. A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[6] Peter Grünwald,et al. A minimum description length approach to grammar inference , 1995, Learning for Natural Language Processing.

[7] Christopher D. Manning,et al. The unsupervised learning of natural language structure , 2005 .

[8] H. Kuhn. The Hungarian method for the assignment problem , 1955 .

[9] Noah A. Smith,et al. Annealing Structural Bias in Multilingual Weighted Grammar Induction , 2006, ACL.

[10] Simon Dennis,et al. An exemplar-based approach to unsupervised parsing , 2005 .

[11] Julia Hirschberg,et al. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[12] Menno van Zaanen,et al. Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[13] Yoav Seginer,et al. Fast Unsupervised Incremental Parsing , 2007, ACL.

[14] J. Munkres. ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[15] Walter Daelemans,et al. Memory-based lexical acquisition and processing , 1993, EAMT.

[16] Rens Bod,et al. Is the End of Supervised Parsing in Sight? , 2007, ACL.

[17] Andreas Stolcke,et al. Bayesian learning of probabilistic language models , 1994 .

[18] Eytan Ruppin,et al. Unsupervised learning of natural languages , 2006 .

[19] Rens Bod,et al. An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[20] Andreas Stolcke,et al. Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[21] Alexander Clark,et al. Unsupervised Language Acquisition: Theory and Practice , 2002, ArXiv.

[22] Adele E. Goldberg,et al. Constructions at Work , 2005 .

[23] Katsuhiko Nakamura,et al. Incremental Learning of Context Free Grammars , 2002, ICGI.

[24] Xiaoqiang Luo,et al. On Coreference Resolution Performance Metrics , 2005, HLT.

[25] Nianwen Xue,et al. Building a Large-Scale Annotated Chinese Corpus , 2002, COLING.

[26] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[27] Stanley F. Chen,et al. Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[28] Jun'ichi Tsujii,et al. GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[29] Willem H. Zuidema,et al. Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction , 2022 .

[30] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[31] Carl de Marcken,et al. Unsupervised language acquisition , 1996, ArXiv.

[32] Pat Langley,et al. Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[33] J. Gerard Wolff,et al. Language acquisition, data compression and generalization , 1982 .

[34] Dan Klein,et al. Prototype-Driven Grammar Induction , 2006, ACL.

[35] George A. Miller,et al. Language and Communication , 1951 .

[36] Georgios Paliouras,et al. e-GRIDS: Computationally Efficient Gramatical Inference from Positive Examples , 2004, Grammars.