Does Typological Blinding Impede Cross-Lingual Sharing?

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features from databases such as the World Atlas of Language Structures (WALS) are a prime candidate for this, as such data exists even for very low-resource languages. However, previous work has only found minor benefits from using typological information. Our hypothesis is that a model trained in a cross-lingual setting will pick up on typological cues from the input data, thus overshadowing the utility of explicitly using such features. We verify this hypothesis by blinding a model to typological information, and investigate how cross-lingual sharing and performance is impacted. Our model is based on a cross-lingual architecture in which the latent weights governing the sharing between languages is learnt during training. We show that (i) preventing this model from exploiting typology severely reduces performance, while a control experiment reaffirms that (ii) encouraging sharing according to typology somewhat improves performance.

[1]  Nicholas Carlini,et al.  Stateful Detection of Black-Box Adversarial Attacks , 2019, Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence.

[2]  Jörg Tiedemann,et al.  What Do Language Representations Really Represent? , 2019, Computational Linguistics.

[3]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[4]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[5]  Jörg Tiedemann,et al.  Continuous multilinguality with language vectors , 2016, EACL.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Ekaterina Shutova,et al.  What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties , 2020, ArXiv.

[9]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[10]  Isabelle Augenstein,et al.  Tracking Typological Traits of Uralic Languages in Distributed Language Representations , 2017, Proceedings of the Fourth International Workshop on Computatinal Linguistics of Uralic Languages.

[11]  Ryan Cotterell,et al.  A Probabilistic Generative Model of Linguistic Typology , 2019, NAACL.

[12]  Hal Daumé,et al.  A Bayesian Model for Discovering Typological Implications , 2007, ACL.

[13]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[14]  Christopher D. Manning,et al.  Finding Universal Grammatical Relations in Multilingual BERT , 2020, ACL.

[15]  Ivan Vulic,et al.  Survey on the Use of Typological Information in Natural Language Processing , 2016, COLING.

[16]  Joachim Bingel,et al.  Latent Multi-Task Architecture Learning , 2017, AAAI.

[17]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[18]  Isabelle Augenstein,et al.  Zero-Shot Cross-Lingual Transfer with Meta Learning , 2020, EMNLP.

[19]  Thierry Poibeau,et al.  Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing , 2018, Computational Linguistics.

[20]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[21]  Alexandra Birch,et al.  Bridging Linguistic Typology and Multilingual Machine Translation with Multi-view Language Representations , 2020, EMNLP.

[22]  Anna Korhonen,et al.  On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling , 2018, EMNLP.

[23]  Isabelle Augenstein,et al.  From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings , 2018, NAACL-HLT.

[24]  Inderjit S. Dhillon,et al.  The Limitations of Adversarial Training and the Blind-Spot Attack , 2019, ICLR.

[25]  Isabelle Augenstein,et al.  Parameter sharing between dependency parsers for related languages , 2018, EMNLP.

[26]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[27]  Yugo Murawaki,et al.  Diachrony-aware Induction of Binary Latent Representations from Typological Features , 2017, IJCNLP.

[28]  Ryan Cotterell,et al.  Uncovering Probabilistic Implications in Typological Knowledge Bases , 2019, ACL.

[29]  Graham Neubig,et al.  Learning Language Representations for Typology Prediction , 2017, EMNLP.