Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank

Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled \emph{and unlabeled} data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models' pretraining data and target language varieties.

[1]  Marcello Federico,et al.  Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary , 2018, IWSLT.

[2]  Mark Dredze,et al.  Sources of Transfer in Multilingual Named Entity Recognition , 2020, ACL.

[3]  Tapio Salakoski,et al.  Is Multilingual BERT Fluent in Language Generation? , 2019, ArXiv.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[6]  Peter Trudgill,et al.  A Glossary of Sociolinguistics , 2019 .

[7]  Jungo Kasai,et al.  Polyglot Contextual Representations Improve Crosslingual Transfer , 2019, NAACL.

[8]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[9]  Naveen Arivazhagan,et al.  Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.

[10]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[11]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[12]  Roy Schwartz,et al.  Show Your Work: Improved Reporting of Experimental Results , 2019, EMNLP.

[13]  Yue Zhang,et al.  Universal Dependencies Parsing for Colloquial Singaporean English , 2017, ACL.

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[17]  Daniel Kondratyuk,et al.  75 Languages, 1 Model: Parsing Universal Dependencies Universally , 2019, EMNLP.

[18]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[19]  Mark Dredze,et al.  Are All Languages Created Equal in Multilingual BERT? , 2020, REPL4NLP.

[20]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[21]  Jacob Eisenstein,et al.  Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling , 2019, EMNLP.

[22]  Dan Roth,et al.  Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[23]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[24]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[25]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[26]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[27]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[28]  Kathleen McKeown,et al.  IMHO Fine-Tuning Improves Claim Detection , 2019, NAACL.

[29]  Veselin Stoyanov,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[30]  Dan Klein,et al.  Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[31]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[32]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[33]  Eneko Agirre,et al.  Give your Text Representation Models some Love: the Case for Basque , 2020, LREC.

[34]  Ming-Wei Chang,et al.  Zero-Shot Entity Linking by Reading Entity Descriptions , 2019, ACL.

[35]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[36]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[37]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[38]  Jungo Kasai,et al.  Low-Resource Parsing with Crosslingual Contextualized Representations , 2019, CoNLL.

[39]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.