A General Methodology to Quantify Biases in Natural Language Data

Biases in data, such as gender and racial stereotypes, are propagated through intelligent systems and amplified at end-user applications. Existing studies detect and quantify biases based on pre-defined attributes. However, in real practices, it is difficult to gather a comprehensive list of sensitive concepts for various categories of biases. We propose a general methodology to quantify dataset biases by measuring the difference of its data distribution with a reference dataset using Maximum Mean Discrepancy. For the case of natural language data, we show that lexicon-based features quantify explicit stereotypes, while deep learning-based features further capture implicit stereotypes represented by complex semantics. Our method provides a more flexible way to detect potential biases.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  K. Crenshaw Mapping the margins: intersectionality, identity politics, and violence against women of color , 1991 .

[3]  Derek L. Hansen,et al.  Computing political preference among twitter followers , 2011, CHI.

[4]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders for Generating Discrete Structures , 2017, ArXiv.

[5]  Wai-Tat Fu,et al.  #Snowden: Understanding Biases Introduced by Behavioral Differences of Opinion Groups on Social Media , 2016, CHI.

[6]  Ingmar Weber,et al.  Cross-hierarchical communication in Twitter conflicts , 2014, HT.

[7]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[8]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[9]  Walid Magdy,et al.  Trump vs. Hillary: What Went Viral During the 2016 US Presidential Election , 2017, SocInfo.

[10]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[11]  Ingmar Weber,et al.  Is Twitter a Public Sphere for Online Conflicts? A Cross-Ideological and Cross-Hierarchical Look , 2014, SocInfo.

[12]  Wojciech Zaremba,et al.  B-test: A Non-parametric, Low Variance Kernel Two-sample Test , 2013, NIPS.

[13]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[14]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[15]  Seungwoo Kang,et al.  NewsCube: delivering multiple aspects of news to mitigate media bias , 2009, CHI.

[16]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[17]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[18]  Giuseppe Zollo,et al.  Measuring Polarization in Twitter Enabled in Online Political Conversation: The Case of 2016 US Presidential Election , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[19]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[20]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[21]  Sean A. Munson,et al.  Unequal Representation and Gender Stereotypes in Image Search Results for Occupations , 2015, CHI.

[22]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[23]  Kimberly M. Christopherson,et al.  Perceptions of Political Bias in the Headlines of Two Major News Organizations , 2007 .

[24]  D. Murphey Bias: A CBS Insider Exposes How the Media Distort the News , 2002 .

[25]  Brian A. Nosek,et al.  Pervasiveness and correlates of implicit attitudes and stereotypes , 2007 .

[26]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[27]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[28]  Joanna Bryson,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[29]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[30]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[31]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[32]  GoldbergYoav A primer on neural network models for natural language processing , 2016 .

[33]  Anne Marie Piper,et al.  Addressing Age-Related Bias in Sentiment Analysis , 2018, CHI.

[34]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[35]  Lei Gao,et al.  Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised Two-path Bootstrapping Approach , 2017, IJCNLP.