Analyzing and Mitigating Gender Bias in Languages with Grammatical Gender and Bilingual Word Embeddings

Word embeddings have been shown to contain gender bias that is inherited from their training corpora. However, existing work focuses on quantifying and mitigating such bias in English, and the analysis cannot be directly applied in language with grammatical gender, such as Spanish. In this paper, we propose new definitions of gender bias for languages with grammatical gender and apply bilingual word embeddings to analyze and mitigate the bias. Experimental results on crosslingual analogy test and Word Embedding Association Test show that the proposed methods can effectively mitigate the multifaceted gender bias.

[1]  Chandler May,et al.  Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.

[2]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[3]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[4]  Chandler May,et al.  On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[5]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[6]  Friederike Braun,et al.  Representation of the sexes in language , 2007 .

[7]  Michael J. Paul,et al.  “ Jerk ” or “ Judgemental ” ? Patient Perceptions of Male versus Female Physicians in Online Reviews , 2016 .

[8]  Katherine McCurdy,et al.  Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings , 2020, ArXiv.

[9]  Marta R. Costa-jussà,et al.  Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[13]  A. Hood,et al.  Gender , 2019, Textile History.

[14]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[15]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[16]  Sean A. Munson,et al.  Unequal Representation and Gender Stereotypes in Image Search Results for Occupations , 2015, CHI.

[17]  Blake Lemoine,et al.  Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[20]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[21]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[22]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[23]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[24]  Noe Casas,et al.  Evaluating the Underlying Gender Bias in Contextualized Word Embeddings , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[25]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.