Challenges in Automated Debiasing for Toxic Language Detection
暂无分享,去创建一个
Yejin Choi | Maarten Sap | Noah A. Smith | Swabha Swayamdipta | Xuhui Zhou | Yejin Choi | Swabha Swayamdipta | Maarten Sap | Xuhui Zhou
[1] Timnit Gebru,et al. Lessons from archives: strategies for collecting sociocultural data in machine learning , 2019, FAT*.
[2] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[3] Yue Ning,et al. Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection , 2019, ArXiv.
[4] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[5] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.
[6] Ingmar Weber,et al. Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.
[7] J. Rosa,et al. Unsettling race and language: Toward a raciolinguistic perspective , 2017, Language in Society.
[8] Gianluca Stringhini,et al. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior , 2018, ICWSM.
[9] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.
[10] J. Hunter. African American English: A Linguistic Introduction , 2002 .
[11] Blake Lemoine,et al. Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.
[12] Yonatan Belinkov,et al. End-to-End Bias Mitigation by Modelling Biases in Corpora , 2020, ACL.
[13] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[14] Yejin Choi,et al. The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.
[15] Carlos Ortiz,et al. Intersectional Bias in Hate Speech and Abusive Language Datasets , 2020, ArXiv.
[16] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[17] Nathan Srebro,et al. Equality of Opportunity in Supervised Learning , 2016, NIPS.
[18] Lyle H. Ungar,et al. User-Level Race and Ethnicity Predictors from Twitter Text , 2018, COLING.
[19] Ronan Le Bras,et al. Adversarial Filters of Dataset Biases , 2020, ICML.
[20] Björn Ross,et al. Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.
[21] Jason Weston,et al. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.
[22] Thiago Dias Oliva,et al. Fighting Hate Speech, Silencing Drag Queens? Artificial Intelligence in Content Moderation and Risks to LGBTQ Voices Online , 2020, Sexuality & Culture.
[23] Guy Bailey,et al. AFRICAN-AMERICAN LANGUAGE USE: IDEOLOGY AND SO-CALLED OBSCENITY , 2013 .
[24] Marta Dynel,et al. The landscape of impoliteness research , 2015 .
[25] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Yejin Choi,et al. Social Bias Frames: Reasoning about Social and Power Implications of Language , 2020, ACL.
[28] Jieyu Zhao,et al. Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[30] Brendan T. O'Connor,et al. Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.
[31] William Yang Wang,et al. Dats Wassup!!: Investigating African-American Vernacular English in Transformer-Based Text Generation , 2020, EMNLP.
[32] Lucy Vasserman,et al. Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.
[33] Noel Crespi,et al. Hate speech detection and racial bias mitigation in social media based on BERT model , 2020, PloS one.
[34] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.
[35] Yulia Tsvetkov,et al. Demoting Racial Bias in Hate Speech Detection , 2020, SOCIALNLP.
[36] Iryna Gurevych,et al. Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance , 2020, ACL.
[37] Adam M. Croom. How to do things with slurs: Studies in the way of derogatory words , 2013 .
[38] Sarah T. Roberts,et al. Behind the Screen , 2019 .
[39] Marta Dynel. Swearing methodologically : the (im)politeness of expletives in anonymous commentaries on Youtube , 2012 .
[40] G. Kasper. Linguistic politeness:: Current research issues☆ , 1990 .
[41] J. Rosa,et al. Looking like a Language, Sounding like a Race , 2018 .
[42] Pascale Fung,et al. Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.
[43] Haohan Wang,et al. Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.
[44] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[45] Björn Technau. Going beyond hate speech: The pragmatics of ethnic slur terms , 2018, Lodz Papers in Pragmatics.
[46] Yejin Choi,et al. The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.
[47] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.