Synthetic Treebanking for Cross-Lingual Dependency Parsing

How do we parse the languages for which no treebanks are available? This contribution addresses the cross-lingual viewpoint on statistical dependency parsing, in which we attempt to make use of resource-rich source language treebanks to build and adapt models for the under-resourced target languages. We outline the benefits, and indicate the drawbacks of the current major approaches. We emphasize synthetic treebanking: the automatic creation of target language treebanks by means of annotation projection and machine translation. We present competitive results in cross-lingual dependency parsing using a combination of various techniques that contribute to the overall success of the method. We further include a detailed discussion about the impact of part-of-speech label accuracy on parsing results that provide guidance in practical applications of cross-lingual methods for truly under-resourced languages.

[1]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[2]  Jörg Tiedemann,et al.  Rediscovering Annotation Projection for Cross-Lingual Parser Induction , 2014, COLING.

[3]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Hai Zhao,et al.  Cross Language Dependency Parsing using a Bilingual Lexicon , 2009, ACL.

[6]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[7]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[8]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[9]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[10]  András Kornai,et al.  HunPos: an open source trigram tagger , 2007, ACL 2007.

[11]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[14]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[15]  Jörg Tiedemann,et al.  Treebank Translation for Cross-Lingual Parser Induction , 2014, CoNLL.

[16]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[17]  Regina Barzilay,et al.  Learning to Map into a Universal POS Tagset , 2012, EMNLP-CoNLL.

[18]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[19]  Emily M. Bender Book Reviews: Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M. Bender , 2013, CL.

[20]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[21]  Anders Søgaard,et al.  Simple task-specific bilingual word embeddings , 2015, NAACL.

[22]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[23]  Treebanks Treebanks Building and Using Parsed Corpora , 2011 .

[24]  Anders Søgaard Unsupervised dependency parsing without training , 2012, Nat. Lang. Eng..

[25]  Dan Klein,et al.  Syntactic Transfer Using a Bilingual Lexicon , 2012, EMNLP-CoNLL.

[26]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[27]  Sigrid Klerke,et al.  Down-stream effects of tree-to-dependency conversions , 2013, HLT-NAACL.

[28]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[29]  Joakim Nivre,et al.  MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[30]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[31]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[32]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[33]  Rudolf Rosa,et al.  HamleDT 2.0: Thirty Dependency Treebanks Stanfordized , 2014, LREC.

[34]  Min Xiao,et al.  Distributed Word Representation Learning for Cross-Lingual Dependency Parsing , 2014, CoNLL.

[35]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[36]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[37]  Jörg Tiedemann Improving the Cross-Lingual Projection of Syntactic Dependencies , 2015, NODALIDA.

[38]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[39]  Fei Xia,et al.  Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization , 2014, ACL.

[40]  Anders Søgaard,et al.  Semi-Supervised Learning and Domain Adaptation in Natural Language Processing , 2013, Semi-Supervised Learning and Domain Adaptation in Natural Language Processing.

[41]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[42]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[43]  Slav Petrov Towards Universal Syntactic Processing of Natural Language (invited talk) , 2014 .

[44]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[45]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[46]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[47]  Barbara Plank,et al.  Do dependency parsing metrics correlate with human judgments? , 2015, CoNLL.

[48]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[49]  Dirk Hovy,et al.  If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[50]  Joakim Nivre,et al.  Inductive Dependency Parsing (Text, Speech and Language Technology) , 2006 .

[51]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[52]  Danijela Merkler,et al.  Slovene-Croatian Treebank Transfer Using Bilingual Lexicon Improves Croatian Dependency Parsing , 2012 .

[53]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[54]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[55]  TiedemannJörg,et al.  Synthetic treebanking for cross-lingual dependency parsing , 2016 .

[56]  Barbara Plank,et al.  Inverted indexing for cross-lingual NLP , 2015, ACL.

[57]  Jörg Tiedemann,et al.  Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets , 2014, EMNLP 2014.

[58]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[59]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[60]  Mohammad Sadegh Rasooli,et al.  Density-Driven Cross-Lingual Transfer of Dependency Parsers , 2015, EMNLP.

[61]  Ondrej Dusek,et al.  HamleDT: Harmonized multi-language dependency treebank , 2014, Lang. Resour. Evaluation.

[62]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[63]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[64]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[65]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[66]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[67]  Preslav Nakov,et al.  Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets , 2013, RANLP.

[68]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.