Robust Constituent-to-Dependency Conversion for English

This paper suggests a robust way of converting constituent-based trees in the Penn Treebank style into dependency trees for several different English corpora. For English, there already exist conversion tools. However, these tools are often customized enough for a specific corpus that they do not necessarily work as well when applied to different corpora involving newly introduced POS-tags or annotation schemes. The desire to improve conversion portability motivated us to build a new conversion tool that would produce more robust results across different corpora. In particular, we have modified the treatment of head-percolation rules, function tags, coordination, gapping, and empty category mappings. We compare our method with the LTH conversion tool used for the CoNLL’07-09 shared tasks. For our experiments, we use 6 different English corpora from OntoNotes release 4.0. To demonstrate the impact our approach has on parsing, we train and test two state-of-the-art dependency parsers, MaltParser and MSTParser, and our own parser, ClearParser, using converted output from both the LTH tool and our method. Our results show that our method removes certain unnecessary nonprojective dependencies and generates fewer unclassified dependencies. All three parsers give higher parsing accuracies on average across these corpora using data generated by our method; especially on semantic dependencies.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[3]  Ralph Grishman,et al.  Covering Treebanks with GLARF , 2001, ACL 2001.

[4]  Marilyn A. Walker,et al.  A Dependency Treebank for English , 2002, LREC.

[5]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[6]  Martin Cmejrek,et al.  Prague Czech-English Dependency Treebank: Any Hopes for a Common Annotation Scheme? , 2004, FCP@NAACL-HLT.

[7]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[8]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[9]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[10]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[11]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[12]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[13]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Richard Johansson,et al.  Dependency-based Semantic Analysis of Natural-language Text , 2008 .

[16]  Jinho D. Choi,et al.  K -best, locally pruned, transition-based dependency parsing using robust risk minimization , 2009 .

[17]  Markus Dickinson,et al.  Correcting Dependency Annotation Errors , 2009, EACL.

[18]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[19]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.