Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.

[1]  Nanyun Peng,et al.  On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[2]  Goran Glavas,et al.  Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors , 2019, *SEMEVAL.

[3]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[4]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[5]  Chandler May,et al.  Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.

[6]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[9]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[10]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[11]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[12]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[13]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[14]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[15]  Claire Cardie,et al.  Multi-Source Cross-Lingual Model Transfer: Learning What to Share , 2018, ACL.

[16]  Saif Mohammad,et al.  Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[17]  Nanyun Peng,et al.  Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing , 2019, EMNLP.

[18]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[19]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[20]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[21]  Adam Tauman Kalai,et al.  What are the Biases in My Word Embedding? , 2018, AIES.

[22]  Ryan Cotterell,et al.  Examining Gender Bias in Languages with Grammatical Gender , 2019, EMNLP.

[23]  David Yarowsky,et al.  A Representation Learning Framework for Multi-Source Transfer Parsing , 2016, AAAI.

[24]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Hervé Jégou,et al.  Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[27]  Nanyun Peng,et al.  Cross-Lingual Dependency Parsing with Unlabeled Auxiliary Languages , 2019, CoNLL.

[28]  Katherine McCurdy,et al.  Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings , 2020, ArXiv.

[29]  Alexandra Chouldechova,et al.  Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.

[30]  Nanyun Peng,et al.  Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition , 2019, HT.

[31]  Alan W Black,et al.  Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[32]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[33]  Alan W Black,et al.  Quantifying Social Biases in Contextual Word Representations , 2019, ACL 2019.