Investigating Failures of Automatic Translation in the Case of Unambiguous Gender

Transformer based models are the modern work horses for neural machine translation (NMT), reaching state of the art across several benchmarks. Despite their impressive accuracy, we observe a systemic and rudimentary class of errors made by transformer based models with regards to translating from a language that doesn’t mark gender on nouns into others that do. We find that even when the surrounding context provides unambiguous evidence of the appropriate grammatical gender marking, no transformer based model we tested was able to accurately gender occupation nouns systematically. We release an evaluation scheme and dataset for measuring the ability of transformer based NMT models to translate gender morphology correctly in unambiguous contexts across syntactically diverse sentences. Our dataset translates from an English source into 20 languages from several different language families. With the availability of this dataset, our hope is that the NMT community can iterate on solutions for this class of especially egregious errors.

[1]  Jason Weston,et al.  Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation , 2020, EMNLP.

[2]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[3]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[4]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[5]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[6]  Sudipta Chattopadhyay,et al.  Astraea: Grammar-based Fairness Testing , 2022, IEEE Transactions on Software Engineering.

[7]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[8]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[9]  Dirk Hovy,et al.  The Social Impact of Natural Language Processing , 2016, ACL.

[10]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[11]  Sabine Sczesny,et al.  Gender stereotypes have changed: A cross-temporal meta-analysis of U.S. public opinion polls from 1946 to 2018. , 2020, The American psychologist.

[12]  T. Malloy Bem Sex Role Inventory , 2010 .

[13]  Anna Szabolcsi Binding on the Fly: Cross-Sentential Anaphora in Variable-Free Semantics , 2003 .

[14]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[15]  T. Reinhart Anaphora and semantic interpretation , 1983 .

[16]  Lauren Ackerman,et al.  Syntactic and cognitive issues in investigating gendered coreference , 2019 .

[17]  S. Bem,et al.  The Lenses of Gender: Transforming the Debate on Sexual Inequality , 1993 .

[18]  Omri Abend,et al.  Automatically Extracting Challenge Sets for Non-Local Phenomena in Neural Machine Translation , 2019, CoNLL.

[19]  D. Buring,et al.  BINDING THEORY , 2003 .

[20]  Noah A. Smith,et al.  Evaluating Gender Bias in Machine Translation , 2019, ACL.

[21]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[22]  Ryan Cotterell,et al.  Unsupervised Discovery of Gendered Language through Latent-Variable Modeling , 2019, ACL.

[23]  Jörg Tiedemann,et al.  OPUS-MT – Building open translation services for the World , 2020, EAMT.

[24]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[25]  Marianne LaFrance,et al.  Is Man the Measure of All Things? A Social Cognitive Account of Androcentrism , 2018, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[26]  J C Winck,et al.  Times they are a-changing. , 2010, Revista portuguesa de pneumologia.

[27]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[28]  Kellie Webster,et al.  Automatically Identifying Gender Issues in Machine Translation using Perturbations , 2020, EMNLP.

[29]  Jason Weston,et al.  Multi-Dimensional Gender Bias Classification , 2020, EMNLP.

[30]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[31]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[32]  M. Hamilton Masculine Bias in the Attribution of Personhood: People = Male, Male = People , 1991 .

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[35]  Chandler May,et al.  Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.

[36]  Philipp Koehn,et al.  Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.

[37]  P. Hegarty,et al.  Androcentrism: Changing the landscape without leveling the playing field. , 2013 .

[38]  Graham Neubig,et al.  Word Alignment by Fine-tuning Embeddings on Parallel Corpora , 2021, EACL.

[39]  Luís C. Lamb,et al.  Assessing gender bias in machine translation: a case study with Google Translate , 2018, Neural Computing and Applications.

[40]  Deborah A. Prentice,et al.  What Women and Men Should Be, Shouldn't be, are Allowed to be, and don't Have to Be: The Contents of Prescriptive Gender Stereotypes , 2002 .

[41]  Anders Sogaard,et al.  Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias , 2020, EMNLP.

[42]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[43]  G. M. Horn,et al.  On ‘On binding’ , 1981 .

[44]  John Gastil,et al.  Generic pronouns and sexist language: The oxymoronic character of masculine generics , 1990 .

[45]  Marcello Federico,et al.  Modelling pronominal anaphora in statistical machine translation , 2010, IWSLT.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Rada Mihalcea,et al.  Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing , 2019, ACL.