Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge

This paper studies cross-lingual transfer for dependency parsing, focusing on very low-resource settings where delexicalized transfer is the only fully automatic option. We show how to boost parsing performance by rewriting the source sentences so as to better match the linguistic regularities of the target language. We contrast a data-driven approach with an approach relying on linguistically motivated rules automatically extracted from the World Atlas of Language Structures. Our findings are backed up by experiments involving 40 languages. They show that both approaches greatly outperform the baseline, the knowledge-driven method yielding the best accuracies, with average improvements of +2.9 UAS, and up to +90 UAS (absolute) on some frequent PoS configurations.

[1]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[2]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[3]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[4]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[5]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[6]  Adam Lopez,et al.  Translation as Weighted Deduction , 2009, EACL.

[7]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[8]  François Yvon,et al.  Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing , 2016, NAACL.

[9]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[10]  Sanjeev Khudanpur,et al.  Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation , 2007, SSST@HLT-NAACL.

[11]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[12]  Shankar Kumar,et al.  Local Phrase Reordering Models for Statistical Machine Translation , 2005, HLT.

[13]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[14]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[15]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[16]  Dan Klein,et al.  Phylogenetic Grammar Induction , 2010, ACL.

[17]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[18]  Arianna Bisazza,et al.  Surveys: A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena , 2015, CL.

[19]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[20]  Jörg Tiedemann,et al.  Treebank Translation for Cross-Lingual Parser Induction , 2014, CoNLL.

[21]  Guillaume Wisniewski,et al.  PanParser: a Modular Implementation for Efficient Transition-Based Dependency Parsing , 2018, Prague Bull. Math. Linguistics.

[22]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[23]  NivreJoakim Algorithms for deterministic incremental dependency parsing , 2008 .