Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Lisa Anne Hendricks | William S. Isaac | John F. J. Mellor | Jonathan Uesato | Geoffrey Irving | Po-Sen Huang | Amelia Glaese | Sumanth Dathathri | Laura Weidinger | Maribeth Rauh | Iason Gabriel | Johannes Welbl | J. Uesato | M. Rauh | A. Glaese
[1] Manjary P Gangan,et al. Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias , 2022, ArXiv.
[2] Miryam de Lhoneux,et al. Challenges and Strategies in Cross-Cultural NLP , 2022, ACL.
[3] Dipankar Ray,et al. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection , 2022, ACL.
[4] Phu Mon Htut,et al. BBQ: A hand-built bias benchmark for question answering , 2021, FINDINGS.
[5] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[6] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[7] Po-Sen Huang,et al. Ethical and social risks of harm from Language Models , 2021, ArXiv.
[8] Leo Laugier,et al. Toxicity Detection can be Sensitive to the Conversational Context , 2021, ArXiv.
[9] Po-Sen Huang,et al. Challenges in Detoxifying Language Models , 2021, EMNLP.
[10] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[11] Nanyun Peng,et al. What do Bias Measures Measure? , 2021, ArXiv.
[12] Eduard H. Hovy,et al. Five sources of bias in natural language processing , 2021, Lang. Linguistics Compass.
[13] Ruslan Salakhutdinov,et al. Towards Understanding and Mitigating Social Biases in Language Models , 2021, ICML.
[14] Christy Dennison,et al. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets , 2021, NeurIPS.
[15] Jason Weston,et al. Bot-Adversarial Dialogue for Safe Conversational Agents , 2021, NAACL.
[16] Leonardo Neves,et al. On Transferability of Bias Mitigation Effects in Language Model Fine-Tuning , 2021, NAACL.
[17] Diyi Yang,et al. The Importance of Modeling Social Factors of Language: Theory and Practice , 2021, NAACL.
[18] Kai-Wei Chang,et al. Societal Biases in Language Generation: Progress and Challenges , 2021, ACL.
[19] Shafiq R. Joty,et al. Reliability Testing for Natural Language Processing Systems , 2021, ACL.
[20] Dan Klein,et al. Detoxifying Language Models Risks Marginalizing Minority Voices , 2021, NAACL.
[21] Tom Everitt,et al. Alignment of Language Agents , 2021, ArXiv.
[22] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[23] Timo Schick,et al. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP , 2021, Transactions of the Association for Computational Linguistics.
[24] Jackie Kay,et al. Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities , 2021, AIES.
[25] Kai-Wei Chang,et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.
[26] Ben Hutchinson,et al. Re-imagining Algorithmic Fairness in India and Beyond , 2021, FAccT.
[27] James Zou,et al. Persistent Anti-Muslim Bias in Large Language Models , 2021, AIES.
[28] Adam Lopez,et al. Intrinsic Bias Metrics Do Not Correlate with Application Bias , 2020, ACL.
[29] Marc Dymetman,et al. A Distributional Approach to Controlled Text Generation , 2020, ICLR.
[30] Emily Denton,et al. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure , 2020, FAccT.
[31] Shafiq R. Joty,et al. GeDi: Generative Discriminator Guided Sequence Generation , 2020, EMNLP.
[32] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[33] Kristina Lerman,et al. A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..
[34] Jordan L. Boyd-Graber,et al. Toward Deconfounding the Effect of Entity Demographics for Question Answering Accuracy , 2021, Conference on Empirical Methods in Natural Language Processing.
[35] Hanna M. Wallach,et al. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets , 2021, ACL.
[36] O. Keyes. You Keep Using That Word: Ways of Thinking about Gender in Computing Research , 2021 .
[37] I. Kivlichan,et al. Capturing Covertly Toxic Speech via Crowdsourcing , 2021, HCINLP.
[38] Phil Blunsom,et al. Pitfalls of Static Language Modelling , 2021, ArXiv.
[39] Kalina Bontcheva,et al. Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis , 2020, AACL.
[40] Daniel Khashabi,et al. UNQOVERing Stereotypical Biases via Underspecified Questions , 2020, FINDINGS.
[41] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.
[42] Shakir Mohamed,et al. Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence , 2020, Philosophy & Technology.
[43] Andrew Smart,et al. Extending the Machine Learning Abstraction Boundary: A Complex Systems Approach to Incorporate Societal Context , 2020, ArXiv.
[44] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[45] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.
[46] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.
[47] Alexandra Chouldechova,et al. A snapshot of the frontiers of fairness in machine learning , 2020, Commun. ACM.
[48] Emily Denton,et al. Unintended machine learning biases as social barriers for persons with disabilitiess , 2020, ACM SIGACCESS Access. Comput..
[49] Emily Denton,et al. Towards a critical race methodology in algorithmic fairness , 2019, FAT*.
[50] Dirk Hovy,et al. Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview , 2019, ACL.
[51] Po-Sen Huang,et al. Reducing Sentiment Bias in Language Models via Counterfactual Evaluation , 2019, FINDINGS.
[52] J. Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.
[53] Meredith Ringel Morris,et al. Toward fairness in AI for people with disabilities SBG@a research roadmap , 2019, ACM SIGACCESS Access. Comput..
[54] Verena Rieser,et al. Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas , 2020, GEBNLP.
[55] Yonatan Belinkov,et al. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.
[56] Yulia Tsvetkov,et al. Fortifying Toxic Speech Detectors Against Veiled Toxicity , 2020, EMNLP.
[57] Lina Dencik,et al. What does it mean to 'solve' the problem of discrimination in hiring?: social, technical and legal perspectives from the UK on automated hiring systems , 2019, FAT*.
[58] Nanyun Peng,et al. The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.
[59] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[60] Lora Aroyo,et al. Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions , 2019, WWW.
[61] Sahin Cem Geyik,et al. Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search , 2019, KDD.
[62] Lucy Vasserman,et al. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.
[63] Danah Boyd,et al. Fairness and Abstraction in Sociotechnical Systems , 2019, FAT.
[64] Sendhil Mullainathan,et al. Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People , 2019, FAT.
[65] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[66] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.
[67] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[68] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[69] Lucy Vasserman,et al. Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.
[70] Emily M. Bender,et al. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.
[71] Sharad Goel,et al. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.
[72] Brendan T. O'Connor,et al. Twitter Universal Dependency Parsing for African-American and Mainstream American English , 2018, ACL.
[73] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[74] Morgan Klaus Scheuerman,et al. Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems , 2018, CHI.
[75] Jieyu Zhao,et al. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.
[76] Timnit Gebru,et al. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.
[77] Brendan T. O'Connor,et al. A Dataset and Classifier for Recognizing Social Media English , 2017, NUT@EMNLP.
[78] Michael Wiegand,et al. A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.
[79] Lucas Dixon,et al. Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.
[80] Brendan T. O'Connor,et al. Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.
[81] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.
[82] Emily M. Bender. Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .
[83] Andrea Esuli,et al. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.
[84] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[85] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[86] A. Davis. Black Feminist Thought: Knowledge, Consciousness and the Politics of Empowerment , 1993 .
[87] Linda R. Waugh. Marked and unmarked: A choice between unequals in semiotic structure , 1982 .