Argument from Old Man’s View: Assessing Social Bias in Argumentation

Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.

[1]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[2]  Benno Stein,et al.  Exploiting Personal Characteristics of Debaters for Predicting Persuasiveness , 2020, ACL.

[3]  Felix Hamborg,et al.  Media Bias, the Social Sciences, and NLP: Automating Frame Analyses to Identify Bias by Word Choice and Labeling , 2020, ACL.

[4]  David Reich,et al.  The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States , 2015, American journal of human genetics.

[5]  Masatoshi Yoshikawa,et al.  Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing , 2020, LREC.

[6]  S. Fiske,et al.  Controlling other people. The impact of power on stereotyping. , 1993, The American psychologist.

[7]  Maryam Najafian,et al.  A Transparent Framework for Evaluating Unintended Demographic Bias in Word Embeddings , 2019, ACL.

[8]  Iryna Gurevych,et al.  Recognizing the Absence of Opposing Arguments in Persuasive Essays , 2016, ArgMining@ACL.

[9]  Marie-Francine Moens,et al.  Argumentation mining , 2011, Artificial Intelligence and Law.

[10]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[11]  Andy Way,et al.  Getting Gender Right in Neural Machine Translation , 2019, EMNLP.

[12]  Yejin Choi,et al.  Social Bias Frames: Reasoning about Social and Power Implications of Language , 2020, ACL.

[13]  A. Greenwald,et al.  Measuring individual differences in implicit cognition: the implicit association test. , 1998, Journal of personality and social psychology.

[14]  Chris Reed,et al.  Debating Technology for Dialogical Argument , 2017, ACM Trans. Internet Techn..

[15]  Benno Stein,et al.  Learning to Flip the Bias of News Headlines , 2018, INLG.

[16]  L. Jorde,et al.  Genetic variation, classification and 'race' , 2004, Nature Genetics.

[17]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[18]  Ruihong Huang,et al.  In Plain Sight: Media Bias Through the Lens of Factual Reporting , 2019, EMNLP.

[19]  Anthony Rios,et al.  Quantifying 60 Years of Gender Bias in Biomedical Research with Word Embeddings , 2020, BIONLP.

[20]  Alan W Black,et al.  Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[21]  Matthias Hagen,et al.  Data Acquisition for Argument Search: The args.me Corpus , 2019, KI.

[22]  Orestis Papakyriakopoulos,et al.  Bias in word embeddings , 2020, FAT*.

[23]  Benno Stein,et al.  Building an Argument Search Engine for the Web , 2017, ArgMining@EMNLP.

[24]  Walther Kindt,et al.  On the problem of bias in political argumentation: An investigation into discussions about political asylum in Germany and Austria , 1997 .

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[27]  Jeff M. Phillips,et al.  Attenuating Bias in Word Vectors , 2019, AISTATS.

[28]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[29]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[30]  Vicente Ordonez,et al.  Bias and Fairness in Natural Language Processing , 2019, EMNLP/IJCNLP.

[31]  Mai ElSherief,et al.  Mitigating Gender Bias in Natural Language Processing: Literature Review , 2019, ACL.

[32]  Graeme Hirst,et al.  Understanding Undesirable Word Embedding Associations , 2019, ACL.

[33]  Wei-Fan Chen,et al.  Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity , 2020, NLPCSS.

[34]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[37]  Claire Cardie,et al.  A Corpus for Modeling User and Language Effects in Argumentation on Online Debating , 2019, ACL.

[38]  Brian Ecker,et al.  Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it , 2016, LREC.

[39]  K. Kawakami,et al.  Stereotyping, prejudice, and discrimination , 2014 .