Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models

All AI models are susceptible to learning biases in data that they are trained on. For generative dialogue models, being trained on real human conversations containing unbalanced gender and race/ethnicity references can lead to models that display learned biases, which we define here broadly as any measurable differences in the distributions of words or semantic content of conversations based on demographic groups. We measure the strength of such biases by producing artificial conversations between two copies of a dialogue model, conditioning one conversational partner to state a name commonly associated with a certain gender and/or race/ethnicity. We find that larger capacity models tend to exhibit more gender bias and greater stereotyping of occupations by gender. We show that several methods of tuning these dialogue models, specifically name scrambling, controlled generation, and unlikelihood training, are effective in reducing bias in conversation, including on a downstream conversational task. Name scrambling is also effective in lowering differences in token usage across conversations where partners have names associated with different genders or races/ethnicities.

[1]  S. Hayakawa Language in Thought and Action , 1949 .

[2]  S. Garwood First-name stereotypes as a factor in self-concept and school achievement. , 1976, Journal of educational psychology.

[3]  S. D. Nelson First-Name Stereotypes and Expected Academic Achievement of Students , 1977 .

[4]  A. Davis,et al.  Women, race & class , 1982 .

[5]  A. Eagly Sex differences in social behavior : a social-role interpretation , 1987 .

[6]  K. Crenshaw Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics , 1989 .

[7]  D. Best,et al.  Measuring sex stereotypes: A multination study, Rev. ed. , 1990 .

[8]  J. Butler Gender Trouble: Feminism and the Subversion of Identity , 1990 .

[9]  S. Lieberson,et al.  DISTINCTIVE AFRICAN AMERICAN NAMES: AN EXPERIMENTAL, HISTORICAL, AND LINGUISTIC ANALYSIS OF INNOVATION* , 1995 .

[10]  Anthony G. Greenwald,et al.  Automatic Preference for White Americans: Eliminating the Familiarity Explanation , 2000 .

[11]  B. Pelham,et al.  Why Susie sells seashells by the seashore: implicit egotism and major life decisions. , 2002, Journal of personality and social psychology.

[12]  Betsy Rymes A Matter of Taste: How Names, Fashions, and Culture Change. , 2003 .

[13]  David N. Figlio Names, Expectations and the Black-White Test Score Gap , 2005 .

[14]  Cecilia,et al.  Are Emily and Greg More Employable Than Lakisha and Jamal ? A Field Experiment on Labor Market Discrimination , 2007 .

[15]  W. Keith Campbell,et al.  Fitting In or Standing Out: Trends in American Parents' Choices for Children’s Names, 1880–2007 , 2010 .

[16]  A. Brown,et al.  “Selling the Farm to Buy the Cow” , 2012 .

[17]  Katherine L. Milkman HETEROGENEITY IN DISCRIMINATION ? : A FIELD EXPERIMENT * , 2012 .

[18]  Arnold K. Ho,et al.  The nature of social dominance orientation: Theorizing and measuring preferences for intergroup inequality using the new SDO₇ scale. , 2015, Journal of personality and social psychology.

[19]  F. Mensah,et al.  Naming ourselves and others , 2015 .

[20]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[21]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[22]  S. Wheeler Two Short “As” and a Rolling “R” , 2016 .

[23]  Bronwyn M. Bjorkman Singular they and the syntactic representation of gender in English , 2017 .

[24]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[25]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[26]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[27]  Piek T. J. M. Vossen,et al.  Talking about other people: an endless range of possibilities , 2018, INLG.

[28]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[29]  Tracy L. Caldwell,et al.  Name Norms: A Guide to Casting Your Next Experiment , 2018, Personality & social psychology bulletin.

[30]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[31]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[32]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[33]  Kira Hall,et al.  The Oxford Handbook of Language and Sexuality , 2018 .

[34]  Trevor Darrell,et al.  Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.

[35]  Konstantinos Tzioumis,et al.  Demographic aspects of first names , 2018, Scientific Data.

[36]  Jason Weston,et al.  Retrieve and Refine: Improved Sequence Generation Models For Dialogue , 2018, SCAI@EMNLP.

[37]  Nam Soo Kim,et al.  On Measuring Gender Bias in Translation of Gender-neutral Pronouns , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[38]  Yi Chern Tan,et al.  Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[39]  Lauren Ackerman,et al.  Syntactic and cognitive issues in investigating gendered coreference , 2019 .

[40]  Noah A. Smith,et al.  Evaluating Gender Bias in Machine Translation , 2019, ACL.

[41]  Jason Weston,et al.  Dialogue Natural Language Inference , 2018, ACL.

[42]  Alexandra Chouldechova,et al.  What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes , 2019, NAACL.

[43]  Jason Weston,et al.  ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[44]  Natasha Jaques,et al.  Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems , 2019, NeurIPS.

[45]  Chandler May,et al.  On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[46]  Noe Casas,et al.  Evaluating the Underlying Gender Bias in Contextualized Word Embeddings , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[47]  Ryan Cotterell,et al.  It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.

[48]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[49]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[50]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[51]  Hewan Girma Black Names, Immigrant Names: Navigating Race and Ethnicity Through Personal Names , 2019 .

[52]  Alan W Black,et al.  Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[53]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[54]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[55]  Alexandra Chouldechova,et al.  Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.

[56]  Luís C. Lamb,et al.  Assessing gender bias in machine translation: a case study with Google Translate , 2018, Neural Computing and Applications.

[57]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[58]  Nayeon Lee,et al.  Exploring Social Bias in Chatbots using Stereotype Knowledge , 2019, WNLP@ACL.

[59]  Xin Jiang,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.

[60]  Hui Liu,et al.  Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning , 2020, EMNLP.

[61]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[62]  J. Weston,et al.  Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation , 2019, EMNLP.

[63]  Yejin Choi,et al.  RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[64]  Anders Sogaard,et al.  Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias , 2020, EMNLP.

[65]  Slav Petrov,et al.  Measuring and Reducing Gendered Correlations in Pre-trained Models , 2020, ArXiv.

[66]  J. Weston,et al.  Recipes for Safety in Open-domain Chatbots , 2020, ArXiv.

[67]  Jiliang Tang,et al.  Does Gender Matter? Towards Fairness in Dialogue Systems , 2019, COLING.

[68]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[69]  Alexander M. Rush,et al.  Pre-trained Summarization Distillation , 2020, ArXiv.

[70]  Samuel R. Bowman,et al.  CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[71]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[72]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, Annual Meeting of the Association for Computational Linguistics.

[73]  Y-Lan Boureau,et al.  Controlling Style in Generated Dialogue , 2020, ArXiv.

[74]  Emily Denton,et al.  Characterising Bias in Compressed Models , 2020, ArXiv.

[75]  Jason Weston,et al.  Multi-Dimensional Gender Bias Classification , 2020, EMNLP.

[76]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[77]  Kirby Conrod Pronouns and Gender in Language , 2020 .

[78]  J. Weston,et al.  Don’t Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training , 2019, ACL.

[79]  Aaron J. Moss,et al.  Demographic Stability on Mechanical Turk Despite COVID-19 , 2020, Trends in Cognitive Sciences.

[80]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[81]  Adithya Renduchintala,et al.  Investigating Failures of Automatic Translation in the Case of Unambiguous Gender , 2021, ACL.

[82]  Marco Gaido,et al.  Gender Bias in Machine Translation , 2021, Transactions of the Association for Computational Linguistics.

[83]  Nanyun Peng,et al.  Revealing Persona Biases in Dialogue Systems , 2021, ArXiv.

[84]  Yulia Tsvetkov,et al.  A Survey of Race, Racism, and Anti-Racism in NLP , 2021, ACL.

[85]  Hanna M. Wallach,et al.  Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets , 2021, ACL.

[86]  Kenneth Heafield,et al.  Gender bias amplification during Speed-Quality optimization in Neural Machine Translation , 2021, ACL.

[87]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[88]  Aylin Caliskan,et al.  Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases , 2020, AIES.

[89]  Daniel de Vassimon Manela,et al.  Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models , 2021, EACL.

[90]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[91]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.