Cross-Lingual Parser Selection for Low-Resource Languages

In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We consistently select appropriate source parsers for our target languages in a realistic cross-lingual parsing experiment.

[1]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[2]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[3]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[4]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[5]  Barbara Plank,et al.  Effective Measures of Domain Similarity for Parsing , 2011, ACL.

[6]  Regina Barzilay,et al.  Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing , 2015, EMNLP.

[7]  Taraka Rama,et al.  How Good are Typological Distances for Determining Genealogical Relationships among Languages? , 2012, COLING.

[8]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[9]  Barbara Plank,et al.  Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[10]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[11]  Dirk Hovy,et al.  If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[12]  Fei Xia,et al.  Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization , 2014, ACL.

[13]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[14]  Anders Søgaard,et al.  Joint part-of-speech and dependency projection from multiple sources , 2016, ACL.

[15]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[16]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[17]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[18]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[19]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[20]  Rudolf Rosa,et al.  MSTParser Model Interpolation for Multi-Source Delexicalized Transfer , 2015, IWPT.

[21]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[22]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[23]  Mohammad Sadegh Rasooli,et al.  Density-Driven Cross-Lingual Transfer of Dependency Parsers , 2015, EMNLP.

[24]  Ivan Vulic,et al.  Survey on the Use of Typological Information in Natural Language Processing , 2016, COLING.

[25]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[26]  Fei Xia,et al.  Comparing Language Similarity across Genetic and Typologically-Based Groupings , 2010, COLING.

[27]  Anders Søgaard,et al.  An Empirical Etudy of Non-Lexical Extensions to Delexicalized Transfer , 2012, COLING.

[28]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[29]  Robert Forkel,et al.  The World Atlas of Language Structures Online , 2009 .

[30]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[31]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.