Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will in-clude a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform robust learning in such an environment. We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language in a variety of adversarial settings to test their robustness. We propose and analyse several mitigating learning algorithms that identify trolls either at the example or at the user level. Our main finding is that user-based methods, that take into account that troll users will exhibit adversarial behavior across multi-ple examples, work best in a variety of settings on our benchmark. We then test these methods in a further real-life setting of conversations collected during deployment, with similar results.

[1]  Eric Michael Smith,et al.  BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.

[2]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[3]  Hwanjun Song,et al.  Learning From Noisy Labels With Deep Neural Networks: A Survey , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Jeffrey P. Bigham,et al.  A Survey of NLP-Related Crowdsourcing HITs: what works and what does not , 2021, ArXiv.

[5]  Shannon L. Spruit,et al.  Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling , 2021, ArXiv.

[6]  Bing Liu,et al.  Lifelong and Continual Learning Dialogue Systems: Learning during Conversation , 2021, AAAI.

[7]  Zhiyi Ma,et al.  Dynabench: Rethinking Benchmarking in NLP , 2021, NAACL.

[8]  Bing Liu,et al.  Continual Learning in Task-Oriented Dialogue Systems , 2020, EMNLP.

[9]  Detecting Trust and Deception in Group Interaction , 2021 .

[10]  J. Weston,et al.  Recipes for Safety in Open-domain Chatbots , 2020, ArXiv.

[11]  Jason Weston,et al.  Deploying Lifelong Open-Domain Dialogue Learning , 2020, ArXiv.

[12]  Eric Michael Smith,et al.  Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions , 2020, ArXiv.

[13]  Aaron J. Moss,et al.  Demographic Stability on Mechanical Turk Despite COVID-19 , 2020, Trends in Cognitive Sciences.

[14]  Agostino Poggi,et al.  A Survey on Troll Detection , 2020, Future Internet.

[15]  J. Weston,et al.  Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation , 2019, EMNLP.

[16]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[17]  Dilek Z. Hakkani-Tür,et al.  Further Advances in Open Domain Dialog Systems in the Third Alexa Prize Socialbot Grand Challenge , 2020 .

[18]  Jason Weston,et al.  Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[19]  Swami Sankaranarayanan,et al.  Learning From Noisy Labels by Regularized Estimation of Annotator Confusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[21]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[22]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[23]  Paolo Favaro,et al.  Deep Bilevel Learning , 2018, ECCV.

[24]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[25]  Keith W. Miller,et al.  Why we should have seen that coming: comments on Microsoft's tay "experiment," and wider implications , 2017, CSOC.

[26]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[27]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[28]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ICLR.

[29]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[30]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[31]  Preslav Nakov,et al.  Hunting for Troll Comments in News Community Forums , 2016, ACL.

[32]  Ihab F. Ilyas,et al.  Data Cleaning: Overview and Emerging Challenges , 2016, SIGMOD Conference.

[33]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[35]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[36]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[37]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[38]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[39]  N. Hara,et al.  Beyond vandalism: Wikipedia trolls , 2010, J. Inf. Sci..

[40]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[41]  Olivier Chapelle,et al.  Support Vector Machines : principes d'induction, Réglage automatique et connaissances à priori , 2004 .

[42]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[43]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .