Simple Semi-supervised Dependency Parsing

We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank, and we show that the cluster-based features yield substantial gains in performance across a wide range of conditions. For example, in the case of English unlabeled second-order parsing, we improve from a baseline accuracy of 92.02% to 93.16%, and in the case of Czech unlabeled second-order parsing, we improve from a baseline accuracy of 86.13% to 87.13%. In addition, we demonstrate that our method also improves performance when small amounts of training data are available, and can roughly halve the amount of supervised data required to reach a desired level of performance.

[1]  M. Sansalone,et al.  Journal of Research of the National Bureau of Standards , 1959, Nature.

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[5]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[6]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[7]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[8]  Jason Eisner,et al.  Bilexical Grammars and their Cubic-Time Parsing Algorithms , 2000 .

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[11]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[12]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[13]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[14]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[15]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[16]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[17]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[18]  Dale Schuurmans,et al.  Strictly Lexical Dependency Parsing , 2005, IWPT.

[19]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[20]  Wei Li,et al.  Semi-Supervised Sequence Modeling with Syntactic Topic Models , 2005, AAAI.

[21]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[22]  Michael Collins,et al.  Hidden-Variable Models for Discriminative Reranking , 2005, HLT.

[23]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[24]  Keith Hall,et al.  Corrective Modeling for Non-Projective Dependency Parsing , 2005, IWPT.

[25]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[26]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[27]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[28]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[29]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[30]  Ivan Titov,et al.  Constituent Parsing with Incremental Sigmoid Belief Networks , 2007, ACL.

[31]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.