Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity
暂无分享,去创建一个
[1] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[2] Jure Leskovec,et al. Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.
[3] Cliff Lampe,et al. When Online Harassment Is Perceived as Justified , 2018, ICWSM.
[4] Tommi S. Jaakkola,et al. Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.
[5] Mohan S. Kankanhalli,et al. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.
[6] Steve Whittaker,et al. Dice in the Black Box: User Experiences with an Inscrutable Algorithm , 2018, AAAI Spring Symposia.
[7] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[8] Jure Leskovec,et al. Faithful and Customizable Explanations of Black Box Models , 2019, AIES.
[9] Kush R. Varshney,et al. The Limits of Abstract Evaluation Metrics: The Case of Hate Speech Detection , 2017, WebSci.
[10] Bernard J. Jansen,et al. Developing an online hate classifier for multiple social media platforms , 2020, Human-centric Computing and Information Sciences.
[11] Klaus-Robert Müller,et al. "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.
[12] Carlos Eduardo Scheidegger,et al. Assessing the Local Interpretability of Machine Learning Models , 2019, ArXiv.
[13] Lucy Vasserman,et al. Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.
[14] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[15] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[16] Eric Horvitz,et al. Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.
[17] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.
[18] Shivakant Mishra,et al. Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network , 2015, SocInfo.
[19] Qiaozhu Mei,et al. Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts , 2018, EMNLP.
[20] Michael Veale,et al. Like Trainer, Like Bot? Inheritance of Bias in Algorithmic Content Moderation , 2017, SocInfo.
[21] Daniel G. Goldstein,et al. Manipulating and Measuring Model Interpretability , 2018, CHI.
[22] Cliff Lampe,et al. Classification and Its Consequences for Online Harassment , 2017, Proc. ACM Hum. Comput. Interact..
[23] Karrie Karahalios,et al. Communicating Algorithmic Process in Online Behavioral Advertising , 2018, CHI.
[24] Kim Halskov,et al. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material , 2017, CHI.
[25] Munmun De Choudhury,et al. Multimodal Classification of Moderated Online Pro-Eating Disorder Content , 2017, CHI.
[26] Yejin Choi,et al. The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.
[27] Lucas Dixon,et al. Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.
[28] Joel R. Tetreault,et al. Finding Good Conversations Online: The Yahoo News Annotated Comments Corpus , 2017, LAW@ACL.
[29] John Pavlopoulos,et al. Deep Learning for User Comment Moderation , 2017, ALW@ACL.
[30] Vivian Lai,et al. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.
[31] Cindy Wang. Interpreting Neural Network Hate Speech Classifiers , 2018, ALW.
[32] Chandan Singh,et al. Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.
[33] Lalana Kagal,et al. J un 2 01 8 Explaining Explanations : An Approach to Evaluating Interpretability of Machine Learning , 2018 .
[34] Shervin Malmasi,et al. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) , 2018 .
[35] Ona de Gibert,et al. Hate Speech Dataset from a White Supremacy Forum , 2018, ALW.
[36] Dietram A. Scheufele,et al. Toxic Talk: How Online Incivility Can Undermine Perceptions of Media , 2016 .
[37] Eshwar Chandrasekharan,et al. Crossmod: A Cross-Community Learning-based System to Assist Reddit Moderators , 2019, Proc. ACM Hum. Comput. Interact..
[38] Sérgio Nunes,et al. A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..
[39] Dong Nguyen,et al. Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.
[40] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.
[41] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.
[42] Wendy E. Mackay,et al. Human-Centred Machine Learning , 2016, CHI Extended Abstracts.
[43] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.
[44] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.
[45] Mykola Pechenizkiy,et al. A Human-Grounded Evaluation of SHAP for Alert Processing , 2019, ArXiv.
[46] Shi Feng,et al. Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.
[47] Shervin Malmasi,et al. Challenges in discriminating profanity from hate speech , 2017, J. Exp. Theor. Artif. Intell..
[48] Cody Buntain,et al. A Large Labeled Corpus for Online Harassment Research , 2017, WebSci.
[49] Emily Chen,et al. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.
[50] Radha Poovendran,et al. Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.
[51] Mária Bieliková,et al. Improving Moderation of Online Discussions via Interpretable Neural Models , 2018, ALW.
[52] A. Bruckman,et al. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator , 2019, ACM Trans. Comput. Hum. Interact..
[53] Casey Fiesler,et al. Reddit Rules! Characterizing an Ecosystem of Governance , 2018, ICWSM.