Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data

The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effects of word order of a language on dependency parsing performance, while controlling for confounding treebank properties. The method uses artificially-generated treebanks that are minimal permutations of actual treebanks with respect to two word order properties: word order variation and dependency lengths. Based on these artificial data on twelve languages, we show that longer dependencies and higher word order variability degrade parsing performance. Our method also extends to minimal pairs of individual sentences, leading to a finer-grained understanding of parsing errors.

[1]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[2]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[3]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[4]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[5]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[6]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[7]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[8]  R. Ferrer i Cancho Why do syntactic links not cross , 2006 .

[9]  Eva Hajicová,et al.  Issues of Projectivity in the Prague Dependency Treebank , 2004, Prague Bull. Math. Linguistics.

[10]  Joakim Nivre,et al.  MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[11]  Haitao Liu,et al.  The risks of mixing dependency lengths from sequences of different length , 2013, ArXiv.

[12]  Ivan Titov,et al.  Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model , 2013, CL.

[13]  Marius L. Jøhndal,et al.  Creating a Parallel Treebank of the Old Indo-European BibleTranslations , 2008 .

[14]  Richard Futrell,et al.  Quantifying Word Order Freedom in Dependency Corpora , 2015, DepLing.

[15]  Mihai Surdeanu,et al.  Ensemble Models for Dependency Parsing: Cheap and Good? , 2010, HLT-NAACL.

[16]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[17]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[18]  John A. Hawkins,et al.  A Performance Theory of Order and Constituency , 1995 .

[19]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[20]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[21]  Benoît Crabbé,et al.  Dependency length minimisation effects in short spans: a large-scale analysis of adjective placement in complex noun phrases , 2015, ACL.

[22]  Evelina Andersson,et al.  Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation , 2011, EMNLP.

[23]  Joakim Nivre,et al.  Evaluation of Dependency Parsers on Unbounded Dependencies , 2010, COLING.

[24]  Mohammad Sadegh Rasooli,et al.  Development of a Persian Syntactic Dependency Treebank , 2013, NAACL 2013.

[25]  Robert C. Berwick,et al.  Treebank parsing and knowledge of language: a cognitive perspective , 2009 .

[26]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[27]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[28]  Paola Merlo,et al.  Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek , 2015, DepLing.

[29]  Daniel Gildea,et al.  Do Grammars Minimize Dependency Length? , 2010, Cogn. Sci..

[30]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[31]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[32]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[33]  Richard Futrell,et al.  Large-scale evidence of dependency length minimization in 37 languages , 2015, Proceedings of the National Academy of Sciences.

[34]  Mark Steedman,et al.  Unbounded Dependency Recovery for Parser Evaluation , 2009, EMNLP.

[35]  Reut Tsarfaty,et al.  Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages , 2011 .

[36]  K F.R.,et al.  ON OPTIMAL LINEAR ARRANGEMENTS OF TREES , 1983 .

[37]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[38]  Emily M. Bender,et al.  Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus , 2011, EMNLP.

[39]  David Temperley,et al.  Minimization of dependency length in written English , 2007, Cognition.

[40]  Kevin Knight,et al.  Automatic Prediction of Parser Accuracy , 2008, EMNLP.

[41]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[42]  J. Hawkins Efficiency and complexity in grammars , 2004 .

[43]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[44]  Paola Merlo Evaluation of Two-level Dependency Representations of Argument Structure in Long-Distance Dependencies , 2015, DepLing.

[45]  Michael White,et al.  Minimal Dependency Length in Realization Ranking , 2012, EMNLP.

[46]  Ivan Titov,et al.  Online graph planarisation for synchronous parsing of semantic and syntactic dependencies , 2009, IJCAI 2009.