Cross-Lingual Domain Adaptation for Dependency Parsing

We show how we can adapt parsing to low-resource domains by combining treebanks across languages for a parser model with treebank embeddings. We demonstrate how we can take advantage of in-domain treebanks from other languages, and show that this is especially useful when only out-of-domain treebanks are available for the target language. The method is also extended to low-resource languages by using out-of-domain treebanks from related languages. Two parameter-free methods for applying treebank embeddings at test time are proposed, which give competitive results to tuned methods when applied to Twitter data and transcribed speech. This gives us a method for selecting treebanks and training a parser targeted at any combination of domain and language.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Joakim Nivre,et al.  How to Parse Low-Resource Languages: Cross-Lingual Parsing, Target Language Annotation, or Both? , 2019, Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019).

[3]  Adam Lopez,et al.  A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages , 2019, EMNLP/IJCNLP.

[4]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[5]  Changki Lee,et al.  SEx BiST: A Multi-Source Trainable Parser with Deep Contextualized Lexical Representations , 2018, CoNLL Shared Task.

[6]  Carlos G'omez-Rodr'iguez,et al.  On the Frailty of Universal POS Tags for Neural UD Parsers , 2020, CONLL.

[7]  John G. Breslin,et al.  Towards a continuous modeling of natural language domains , 2016, ArXiv.

[8]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[9]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[10]  Sampo Pyysalo,et al.  Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[11]  Joakim Nivre,et al.  Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle , 2017, IWPT.

[12]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[13]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[14]  Nanyun Peng,et al.  On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[15]  Wouter Weerkamp,et al.  What’s in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Translation , 2015, ACL.

[16]  Yuji Matsumoto,et al.  Adversarial Training for Cross-Domain Universal Dependency Parsing , 2017, CoNLL Shared Task.

[17]  Jennifer Foster,et al.  Treebank Embedding Vectors for Out-of-Domain Dependency Parsing , 2020, ACL.

[18]  David Y. W. Lee,et al.  Genres, Registers, Text Types, Domains and Styles: Clarifying the Concepts and Navigating a Path through the BNC Jungle , 2001 .

[19]  Joakim Nivre,et al.  From Raw Text to Universal Dependencies - Look, No Tags! , 2017, CoNLL.

[20]  Joakim Nivre,et al.  An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing , 2018, EMNLP.

[21]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[22]  Daniel Kondratyuk,et al.  75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.

[23]  Joakim Nivre,et al.  Parser Training with Heterogeneous Treebanks , 2018, ACL.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Tiejun Zhao,et al.  Learning Domain Invariant Word Representations for Parsing Domain Adaptation , 2019, NLPCC.

[26]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[27]  Joakim Nivre,et al.  82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models , 2018, CoNLL.