How Linguistically Fair Are Multilingual Pre-Trained Language Models?

Massively multilingual pre-trained language models, such as mBERT and XLM-RoBERTa, have received significant attention in the recent NLP literature for their excellent capability towards crosslingual zero-shot transfer of NLP tasks. This is especially promising because a large number of languages have no or very little labeled data for supervised learning. Moreover, a substantially improved performance on low resource languages without any significant degradation of accuracy for high resource languages lead us to believe that these models will help attain a fairer distribution of language technologies despite the prevalent unfair and extremely skewed distribution of resources across the world’s languages. Nevertheless, these models, and the experimental approaches adopted by the researchers to arrive at those, have been criticised by some for lacking a nuanced and thorough comparison of benefits across languages and tasks. A related and important question that has received little attention is how to choose from a set of models, when no single model significantly outperforms the others on all tasks and languages. As we discuss in this paper, this is often the case, and the choices are usually made without a clear articulation of reasons or underlying fairness assumptions. In this work, we scrutinize the choices made in previous work, and propose a few different strategies for fair and efficient model selection based on the principles of fairness in economics and social choice theory. In particular, we emphasize Rawlsian fairness, which provides an appropriate framework for making fair (with respect to languages, or tasks, or both) choices while selecting multilingual pre-trained language models for a practical or

[1]  J. Neumann,et al.  Theory of Games and Economic Behavior: 60th Anniversary Commemorative Edition , 2020 .

[2]  Maryam Najafian,et al.  A Transparent Framework for Evaluating Unintended Demographic Bias in Word Embeddings , 2019, ACL.

[3]  Krishna P. Gummadi,et al.  A Moral Framework for Understanding of Fair ML through Economic Models of Equality of Opportunity , 2018, ArXiv.

[4]  Paramesh Ray Independence of Irrelevant Alternatives , 1973 .

[5]  Thamar Solorio,et al.  LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation , 2020, LREC.

[6]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[7]  J. Harsanyi Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility , 1955 .

[8]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[9]  Nanyun Peng,et al.  On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[10]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[11]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[12]  Jieyu Zhao,et al.  Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer , 2020, ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[15]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[16]  M. Yaari Rawls, edgeworth, shapley, nash: Theories of distributive justice re-examined , 1981 .

[17]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[18]  Rayid Ghani,et al.  Aequitas: A Bias and Fairness Audit Toolkit , 2018, ArXiv.

[19]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[20]  Mark Dredze,et al.  Are All Languages Created Equal in Multilingual BERT? , 2020, REPL4NLP.

[21]  Jaime G. Carbonell,et al.  Zero-shot Neural Transfer for Cross-lingual Entity Linking , 2018, AAAI.

[22]  J. Rawls,et al.  A Theory of Justice , 1971, Princeton Readings in Political Thought.

[23]  Claudia Soria,et al.  The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages , 2018, LREC.

[24]  S. Strasnick Social Choice and the Derivation of Rawls's Difference Principle , 1976 .

[25]  Derek Leben,et al.  Normative Principles for Evaluating Fairness in Machine Learning , 2020, AIES.

[26]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[27]  Eneko Agirre,et al.  A Call for More Rigor in Unsupervised Cross-lingual Learning , 2020, ACL.

[28]  Graham Neubig,et al.  Choosing Transfer Languages for Cross-Lingual Learning , 2019, ACL.

[29]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[30]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[31]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[32]  M. Sion On general minimax theorems , 1958 .

[33]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[34]  Nisarg Shah,et al.  Designing Fairly Fair Classifiers Via Economic Fairness Notions , 2020, WWW.

[35]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[36]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[37]  Luca Oneto,et al.  Fairness in Machine Learning , 2020, INNSBDDL.

[38]  Monojit Choudhury,et al.  GLUECoS: An Evaluation Benchmark for Code-Switched NLP , 2020, ACL.

[39]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[40]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[41]  H. Peyton Young,et al.  Equity - in theory and practice , 1994 .

[42]  Goran Glavas,et al.  From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers , 2020, ArXiv.

[43]  Pierre Zweigenbaum,et al.  Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora , 2017, BUCC@ACL.

[44]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[45]  Reuben Binns,et al.  Fairness in Machine Learning: Lessons from Political Philosophy , 2017, FAT.

[46]  Nisheeth K. Vishnoi,et al.  Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees , 2018, FAT.

[47]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[48]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[49]  Amartya Sen,et al.  Social Choice and Justice: A Review Article , 1985 .

[50]  P. Hammond Equity, Arrow's Conditions, and Rawls' Difference Principle , 1976 .

[51]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.