EXPLOITING SUBTREES IN AUTO‐PARSED DATA TO IMPROVE DEPENDENCY PARSING

Dependency parsing has attracted considerable interest from researchers and developers in natural language processing. However, to obtain a high‐accuracy dependency parser, supervised techniques require a large volume of hand‐annotated data, which are extremely expensive. This paper presents a simple and effective approach for improving dependency parsing with subtrees derived from unannotated data, which are easy to obtain. First, we use a baseline parser to parse large‐scale unannotated data. Then, we extract subtrees from dependency parse trees in the auto‐parsed data. Next, the extracted subtrees are classified into several sets according to their frequency. Finally, we design new features based on the subtree sets for parsing algorithms. To demonstrate the effectiveness of our proposed approach, we conduct experiments on the English Penn Treebank and Chinese Penn Treebank. The results show that our approach significantly outperforms baseline systems. It also achieves the best accuracy for the Chinese data and an accuracy competitive with the best known systems for the English data.

[1]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[2]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[3]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[4]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[5]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[6]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[7]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[8]  Xavier Carreras,et al.  An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing , 2009, EMNLP.

[9]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[10]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[11]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[12]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[13]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[14]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[15]  Joakim Nivre,et al.  Discriminative Classifiers for Deterministic Dependency Parsing , 2006, ACL.

[16]  Kun Yu,et al.  Chinese Dependency Parsing with Large Scale Automatically Constructed Case Structures , 2008, COLING.

[17]  Kentaro Torisawa,et al.  Improving Dependency Parsing with Subtrees from Auto-Parsed Data , 2009, EMNLP.

[18]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[19]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[20]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[21]  Hitoshi Isahara,et al.  Using Short Dependency Relations from Auto-Parsed Data for Chinese Dependency Parsing , 2009, TALIP.

[22]  Bo Xu,et al.  Probabilistic Models for Action-Based Chinese Dependency Parsing , 2007, ECML.

[23]  Kun Yu,et al.  Example-based machine translation based on deeper NLP , 2006, IWSLT.

[24]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[25]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[26]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[27]  Dale Schuurmans,et al.  Simple Training of Dependency Parsers via Structured Boosting , 2007, IJCAI.

[28]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[29]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[30]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[31]  Hai Zhao,et al.  Cross Language Dependency Parsing using a Bilingual Lexicon , 2009, ACL.

[32]  Hai Zhao,et al.  Parsing Syntactic and Semantic Dependencies with Two Single-Stage Maximum Entropy Models , 2008, CoNLL.

[33]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[34]  Kevin Duh,et al.  Multilingual Dependency Parsing using Bayes Point Machines , 2006, NAACL.

[35]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[36]  Hitoshi Isahara,et al.  Dependency Parsing with Short Dependency Relations in Unlabeled Data , 2008, IJCNLP.

[37]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[38]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[39]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[40]  Charles N. Li,et al.  Mandarin Chinese: A Functional Reference Grammar , 1989 .

[41]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..