Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science

In this position paper, we propose data statements as a practice that NLP technologists, in both research and development, can adopt to begin to address critical scientific and ethical issues that result from the use of data from certain populations in the development of technology for other populations. We present a form data statements can take and explore the implications of adopting them as part of our regular practice. We argue that they will help alleviate issues related to exclusion and bias in language technology; lead to better precision in claims about how NLP research can generalize and thus better engineering results; protect companies from public embarrassment; and ultimately lead to language technology that meets its users in their own preferred linguistic style and furthermore does not misrepresent them to others.

[1]  R. Kreuz,et al.  Lexical Influences on the Perception of Sarcasm , 2007 .

[2]  Ben Shneiderman,et al.  Opinion: The dangers of faulty, biased, or malicious algorithms requires independent oversight , 2016, Proceedings of the National Academy of Sciences.

[3]  Batya Friedman,et al.  The Watcher and the Watched: Social Judgments About Privacy in a Public Place , 2006, Media Space 20+ Years of Mediated Life.

[4]  Rod Ellis,et al.  The Study of Second Language Acquisition , 1994 .

[5]  Lawrence Mbuagbaw,et al.  Considerations and guidance in designing equity-relevant clinical trials , 2017, International Journal for Equity in Health.

[6]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[7]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[8]  Jun Hong,et al.  Sarcasm Detection on Czech and English Twitter , 2014, COLING.

[9]  Steven Bird,et al.  White paper on establishing an infrastructure for open language archiving , 2000 .

[10]  Batya Friedman,et al.  Multi-Lifespan Information System Design in Support of Transitional Justice: Evolving Situated Design Principles for the Long(er) Term , 2017, Interacting with computers.

[11]  Helen Nissenbaum,et al.  Bias in computer systems , 1996, TOIS.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[14]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[15]  Abolfazl Asudeh,et al.  A Nutritional Label for Rankings , 2018, SIGMOD Conference.

[16]  Kalina Bontcheva,et al.  Broad Twitter Corpus: A Diverse Named Entity Recognition Resource , 2016, COLING.

[17]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[18]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[19]  S. Ervin-Tripp An Analysis of the Interaction of Language, Topic, and Listener , 1964 .

[20]  Alan Borning,et al.  A Survey of Value Sensitive Design Methods , 2018, Found. Trends Hum. Comput. Interact..

[21]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[22]  Rui Yao,et al.  Publication Manual of the American Psychological Association , 2011 .

[23]  Batya Friedman,et al.  Multi-lifespan information system design: a research initiative for the hci community , 2010, CHI.

[24]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[25]  D. Moher,et al.  CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials , 2010, BMJ : British Medical Journal.

[26]  Dirk Hovy,et al.  Tagging Performance Correlates with Author Age , 2015, ACL.

[27]  Kevin Jiang Introduction , 2013, Nature Medicine.

[28]  Ben Coppin,et al.  Artificial Intelligence Illuminated , 2004 .

[29]  J. Reidenberg,et al.  Accountable Algorithms , 2016 .

[30]  Batya Friedman,et al.  Multi-lifespan information system design: investigating a new design approach in Rwanda , 2011, iConference.

[31]  Alan Borning,et al.  Parenting from the pocket: value tensions and technical directions for secure and private parent-teen mobile safety , 2010, SOUPS.

[32]  Dirk Hovy,et al.  The Social Impact of Natural Language Processing , 2016, ACL.

[33]  Predrag V. Klasnja,et al.  Value scenarios: a technique for envisioning systemic effects of new technologies , 2007, CHI Extended Abstracts.

[34]  Saif Mohammad,et al.  Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[35]  Christopher D. Manning,et al.  Subword Variation in Text Message Classification , 2010, NAACL.

[36]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[37]  Dirk Hovy,et al.  Challenges of studying and processing dialects in social media , 2015, NUT@IJCNLP.

[38]  D. Citron Technological Due Process , 2007 .

[39]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[40]  Rachael Tatman,et al.  Gender and Dialect Bias in YouTube’s Automatic Captions , 2017, EthNLP@EACL.

[41]  Alan Borning,et al.  Value Sensitive Design and Information Systems , 2020, The Ethics of Information Technologies.

[42]  Parker Magin,et al.  Participant demographics reported in "Table 1" of randomised controlled trials: a case of "inverse evidence"? , 2012, International Journal for Equity in Health.

[43]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[44]  Nicholas Diakopoulos,et al.  Accountability in algorithmic decision making , 2016, Commun. ACM.

[45]  Batya Friedman,et al.  Public curation of a historic collection: a means for speaking safely in public , 2012, CSCW.

[46]  W. Labov The social stratification of English in New York City , 1969 .

[47]  Yulia Tsvetkov,et al.  Incorporating Dialectal Variability for Socially Equitable Language Identification , 2017, ACL.