KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer

We present KLcpos3 , a language similarity measure based on Kullback-Leibler divergence of coarse part-of-speech tag trigram distributions in tagged corpora. It has been designed for multilingual delexicalized parsing, both for source treebank selection in single-source parser transfer, and for source treebank weighting in multi-source transfer. In the selection task, KLcpos3 identifies the best source treebank in 8 out of 18 cases. In the weighting task, it brings +4.5% UAS absolute, compared to unweighted parse tree combination.

[1]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[2]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[3]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[4]  Anders Søgaard,et al.  An Empirical Etudy of Non-Lexical Extensions to Delexicalized Transfer , 2012, COLING.

[5]  Rudolf Rosa,et al.  HamleDT 2.0: Thirty Dependency Treebanks Stanfordized , 2014, LREC.

[6]  Daniel Zeman,et al.  Reusable Tagset Conversion Using Tagset Drivers , 2008, LREC.

[7]  Rudolf Rosa MSTperl parser (2015-05-19) , 2015 .

[8]  Lillian Lee,et al.  On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[9]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[10]  Claudio Gentile,et al.  Linear Algorithms for Online Multitask Classification , 2010, COLT.

[11]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[12]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[13]  Dirk Hovy,et al.  If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[14]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[15]  Rudolf Rosa MSTperl delexicalized parser transfer scripts and configuration files , 2015 .

[16]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[17]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[18]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[19]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[20]  Daniel Zeman,et al.  HamleDT: To Parse or Not to Parse? , 2012, LREC.

[21]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[22]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[23]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[24]  Mihai Surdeanu,et al.  Ensemble Models for Dependency Parsing: Cheap and Good? , 2010, HLT-NAACL.

[25]  Barbara Plank,et al.  Effective Measures of Domain Similarity for Parsing , 2011, ACL.

[26]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[27]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[28]  Rudolf Rosa Multi-source Cross-lingual Delexicalized Parser Transfer: Prague or Stanford? , 2015, DepLing.