Explainable Cross-Topic Stance Detection for Search Results

One way to help users navigate debated topics online is to apply stance detection in web search. Automatically identifying whether search results are against, neutral, or in favor could facilitate diversification efforts and support interventions that aim to mitigate cognitive biases. To be truly useful in this context, however, stance detection models not only need to make accurate (cross-topic) predictions but also be sufficiently explainable to users when applied to search results – an issue that is currently unclear. This paper presents a study into the feasibility of using current stance detection approaches to assist users in their web search on debated topics. We train and evaluate 10 stance detection models using a stance-annotated data set of 1204 search results. In a preregistered user study (N = 291), we then investigate the quality of stance detection explanations created using different explainability methods and explanation visualization techniques. The models we implement predict stances of search results across topics with satisfying quality (i.e., similar to the state-of-the-art for other data types). However, our results reveal stark differences in explanation quality (i.e., as measured by users’ ability to simulate model predictions and their attitudes towards the explanations) between different models and explainability methods. A qualitative analysis of textual user feedback further reveals potential application areas, user concerns, and improvement suggestions for such explanations. Our findings have important implications for the development of user-centered solutions surrounding web search on debated topics.

[1]  Scott Cheng‐Hsin Yang,et al.  A psychological theory of explainability , 2022, ICML.

[2]  David Elsweiler,et al.  Featured Snippets and their Influence on Users’ Credibility Judgements , 2022, CHIIR.

[3]  Tim Draws,et al.  Comprehensive Viewpoint Representations for a Deeper Understanding of User Interactions With Debated Topics , 2022, CHIIR.

[4]  Ngoc Thang Vu,et al.  Human Interpretation of Saliency-based Explanation Over Text , 2022, FAccT.

[5]  Siva Reddy,et al.  Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining , 2021, EMNLP.

[6]  Antske Fokkens,et al.  Is Stance Detection Topic-Independent and Cross-topic Generalizable? - A Reproduction Study , 2021, ARGMINING.

[7]  H. A. Schwartz,et al.  MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection , 2021, EMNLP.

[8]  Preslav Nakov,et al.  Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training , 2021, AAAI.

[9]  M. Theune,et al.  This Item Might Reinforce Your Opinion: Obfuscation and Labeling of Search Results to Mitigate Confirmation Bias , 2021, HT.

[10]  A. Chandar,et al.  Post-hoc Interpretability for Neural NLP: A Survey , 2021, ACM Computing Surveys.

[11]  Nava Tintarev,et al.  This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affect User Attitudes on Debated Topics , 2021, SIGIR.

[12]  Lisa Singh,et al.  Knowledge Enhanced Masked Language Model for Stance Detection , 2021, NAACL.

[13]  Isabelle Augenstein,et al.  Cross-Domain Label-Adaptive Stance Detection , 2021, EMNLP.

[14]  T. S. Raghu,et al.  Stance detection with BERT embeddings for credibility analysis of information on social media , 2021, PeerJ Comput. Sci..

[15]  L. Azzopardi Cognitive Biases in Search: A Review and Reflection of Cognitive Biases in Information Retrieval , 2021, CHIIR.

[16]  Amir Hussain,et al.  A novel approach to stance detection in social media tweets by fusing ranked lists and sentiments , 2021, Inf. Fusion.

[17]  Isabelle Augenstein,et al.  A Survey on Stance Detection for Mis- and Disinformation Identification , 2021, NAACL-HLT.

[18]  Md. Saiful Islam,et al.  A transformer based approach for fighting COVID-19 fake news , 2021, ArXiv.

[19]  Nava Tintarev,et al.  Operationalizing Framing to Support Multiperspective Recommendations of Opinion Pieces , 2021, FAccT.

[20]  Matthew E. Peters,et al.  Explaining NLP Models via Minimal Contrastive Editing (MiCE) , 2020, FINDINGS.

[21]  William W. Cohen,et al.  Evaluating Explanations: How Much Do Explanations from the Teacher Aid Students? , 2020, TACL.

[22]  A. Bozzon,et al.  Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics , 2020, SIGKDD Explor..

[23]  Shiyue Zhang,et al.  Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? , 2020, FINDINGS.

[24]  Kathleen McKeown,et al.  Zero-Shot Stance Detection: A Dataset and Model Using Generalized Topic Representations , 2020, EMNLP.

[25]  R. Aharonov,et al.  A Survey of the State of Explainable AI for Natural Language Processing , 2020, AACL.

[26]  Bilal Alsallakh,et al.  Captum: A unified and generic model interpretability library for PyTorch , 2020, ArXiv.

[27]  YangPeng,et al.  Pretrained Embeddings for Stance Detection with Hierarchical Capsule Network on Social Media , 2020, ACM Trans. Inf. Syst..

[28]  Elena Mugellini,et al.  Overview of the Transformer-based Models for NLP Tasks , 2020, 2020 15th Conference on Computer Science and Information Systems (FedCSIS).

[29]  Asif Ekbal,et al.  Exploiting stance hierarchies for cost-sensitive stance detection of Web documents , 2020, Journal of Intelligent Information Systems.

[30]  Frank Rudzicz,et al.  Sequential Explanations with Mental Model-Based Policies , 2020, ArXiv.

[31]  Anant Khandelwal,et al.  Fine-Tune Longformer for Jointly Predicting Rumor Stance and Veracity , 2020, COMAD/CODS.

[32]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[33]  Walid Magdy,et al.  Stance Detection on Social Media: State of the Art and Trends , 2020, Inf. Process. Manag..

[34]  Arzucan Özgür,et al.  Analyzing ELMo and DistilBERT on Socio-political News Classification , 2020, AESPEN.

[35]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[36]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[37]  Colin Porlezza,et al.  We are the Change that we Seek: Information Interactions During a Change of Viewpoint , 2020, CHIIR.

[38]  Mark D. Smucker,et al.  A Think-Aloud Study to Understand Factors Affecting Online Health Search , 2020, CHIIR.

[39]  F. Can,et al.  Stance Detection , 2020, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[40]  Frederick Liu,et al.  Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.

[41]  Iryna Gurevych,et al.  Stance Detection Benchmark: How Robust is Your Stance Detection? , 2020, KI - Künstliche Intelligenz.

[42]  Ruoyuan Gao,et al.  Toward creating a fairer ranking in search engine results , 2020, Inf. Process. Manag..

[43]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[44]  Mukund Sundararajan,et al.  The many Shapley values for model explanation , 2019, ICML.

[45]  Walid Magdy,et al.  Your Stance is Exposed! Analysing Possible Factors for Stance Detection on Social Media , 2019, Proc. ACM Hum. Comput. Interact..

[46]  David G. Rand,et al.  Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning , 2019, Cognition.

[47]  Iryna Gurevych,et al.  Classification and Clustering of Arguments with Contextualized Word Embeddings , 2019, ACL.

[48]  Cornelius Puschmann,et al.  Beyond the Bubble: Assessing the Diversity of Political Search Results , 2018, Digital Journalism.

[49]  Devi Parikh,et al.  Do explanations make VQA models more predictable to a human? , 2018, EMNLP.

[50]  K. Gummadi,et al.  Search bias quantification: investigating political bias in social media and web search , 2018, Information Retrieval Journal.

[51]  Krishna P. Gummadi,et al.  Search bias quantification: investigating political bias in social media and web search , 2018, Information Retrieval Journal.

[52]  Nava Tintarev,et al.  Same, Same, but Different: Algorithmic Diversification of Viewpoints in News , 2018, UMAP.

[53]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[54]  Marco Spruit,et al.  Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text , 2018, Applied Sciences.

[55]  Paolo Rosso,et al.  Stance Evolution and Twitter Interactions in an Italian Political Debate , 2018, NLDB.

[56]  Iryna Gurevych,et al.  A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[57]  Leon Derczynski,et al.  Stance Prediction for Russian: Data and Analysis , 2018, SEDA.

[58]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[59]  Cécile Paris,et al.  Cross-Target Stance Classification with Self-Attention Networks , 2018, ACL.

[60]  Udo Kruschwitz,et al.  Scalable Visualisation of Sentiment and Stance , 2018, LREC.

[61]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[62]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[63]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[64]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[65]  Shourya Roy,et al.  Stance classification of multi-perspective consumer health information , 2018, COMAD/CODS.

[66]  Saroj Kaushik,et al.  Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using Attention , 2018, ECIR.

[67]  David Lazer,et al.  Suppressing the Search Engine Manipulation Effect (SEME) , 2017, Proc. ACM Hum. Comput. Interact..

[68]  Saroj Kaushik,et al.  Twitter Stance Detection — A Subjectivity and Sentiment Polarity Inspired Two-Phase Approach , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[69]  Charles L. A. Clarke,et al.  The Positive and Negative Influence of Search Results on People's Decisions about the Efficacy of Medical Treatments , 2017, ICTIR.

[70]  Walid Magdy,et al.  Improved Stance Prediction in a User Similarity Feature Space , 2017, ASONAM.

[71]  Hung-Yu Kao,et al.  IKM at SemEval-2017 Task 8: Convolutional Neural Networks for stance detection and rumor verification , 2017, *SEMEVAL.

[72]  Jodi Schneider,et al.  Stance Classification of Twitter Debates: The Encryption Debate as A Use Case , 2017, SMSociety.

[73]  Yong Yu,et al.  “We make choices we think are going to save us”: Debate and stance identification for online breast cancer CAM discussions , 2017, WWW.

[74]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[75]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[76]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[77]  Nan Yu,et al.  Stance Detection in Chinese MicroBlogs with Neural Networks , 2016, NLPCC/ICCPOL.

[78]  Daling Wang,et al.  An Empirical Study on Chinese Microblog Stance Detection Using Supervised and Semi-supervised Machine Learning Methods , 2016, NLPCC/ICCPOL.

[79]  Yu Zhou,et al.  Overview of NLPCC Shared Task 4: Stance Detection in Chinese Microblogs , 2016, NLPCC/ICCPOL.

[80]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[81]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[82]  Satoshi Shimada,et al.  Can Disputed Topic Suggestion Enhance User Consideration of Information Credibility in Web Search? , 2016, HT.

[83]  Kalina Bontcheva,et al.  Stance Detection with Bidirectional Conditional Encoding , 2016, EMNLP.

[84]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[85]  Martin Tutek,et al.  TakeLab at SemEval-2016 Task 6: Stance Classification in Tweets Using a Genetic Algorithm Based Ensemble , 2016, *SEMEVAL.

[86]  Naoaki Okazaki,et al.  Tohoku at SemEval-2016 Task 6: Feature-based Model versus Convolutional Neural Network for Stance Detection , 2016, *SEMEVAL.

[87]  Amita Misra,et al.  NLDS-UCSC at SemEval-2016 Task 6: A Semi-Supervised Approach to Detecting Stance in Tweets , 2016, *SEMEVAL.

[88]  Ahmed Allam,et al.  Manipulating Google’s Knowledge Graph Box to Counter Biased Information Processing During an Online Search on Vaccination: Application of a Technological Debiasing Strategy , 2016, Journal of medical Internet research.

[89]  Yue Chen,et al.  IUCL at SemEval-2016 Task 6: An Ensemble Model for Stance Detection in Twitter , 2016, *SEMEVAL.

[90]  Braja Gopal Patra,et al.  JU_NLP at SemEval-2016 Task 6: Detecting Stance in Tweets using Support Vector Machines , 2016, *SEMEVAL.

[91]  Karin Becker,et al.  INF-UFRGS-OPINION-MINING at SemEval-2016 Task 6: Automatic Generation of a Training Corpus for Unsupervised Identification of Stance in Tweets , 2016, *SEMEVAL.

[92]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[93]  Torsten Zesch,et al.  ltl.uni-due at SemEval-2016 Task 6: Stance Detection in Social Media Using Stacked Classifiers , 2016, *SEMEVAL.

[94]  Brian Ecker,et al.  Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it , 2016, LREC.

[95]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[96]  Ronald E. Robertson,et al.  The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections , 2015, Proceedings of the National Academy of Sciences.

[97]  Ryen W. White,et al.  Belief Dynamics and Biases in Web Search , 2015, ACM Trans. Inf. Syst..

[98]  Ryen W. White,et al.  Content Bias in Online Health Search , 2014, TWEB.

[99]  M. Lee,et al.  Bayesian Cognitive Modeling: A Practical Course , 2014 .

[100]  Peter Johannes Schulz,et al.  The Impact of Search Engine Selection and Sorting Criteria on Vaccination Beliefs and Attitudes: Two Experiments Manipulating Google Output , 2014, Journal of medical Internet research.

[101]  S. Dumais,et al.  Promoting Civil Discourse Through Search Engine Diversity , 2014 .

[102]  Ryen W. White Beliefs and biases in web search , 2013, SIGIR.

[103]  Raymond H. Putra,et al.  Support or Oppose? Classifying Positions in Online Debates from Reply Activities and Opinion Expressions , 2010, COLING.

[104]  Swapna Somasundaran,et al.  Recognizing Stances in Ideological On-Line Debates , 2010, HLT-NAACL 2010.

[105]  E. Erdfelder,et al.  Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses , 2009, Behavior research methods.

[106]  E. Loper,et al.  NLTK: The Natural Language Toolkit , 2006, ACL 2006.

[107]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[108]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[109]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[110]  P. Dasgupta,et al.  Short papers , 2020, 2010 International Conference on Wireless Information Networks and Systems (WINSYS).

[111]  Tim Draws,et al.  Viewpoint Diversity in Search Results , 2023, ECIR.

[112]  Simone Paolo Ponzetto,et al.  Stacked Model based Argument Extraction and Stance Detection using Embedded LSTM model , 2022, Conference and Labs of the Evaluation Forum.

[113]  Emily Allaway,et al.  Human Rationales as Attribution Priors for Explainable Stance Detection , 2021, EMNLP.

[114]  Kareem Darwish,et al.  A Few Topical Tweets are Enough for Effective User Stance Detection , 2021, EACL.

[115]  Cornelia Caragea,et al.  Stance Detection in COVID-19 Tweets , 2021, ACL.

[116]  Marta Esther Vicente,et al.  Exploring Summarization to Enhance Headline Stance Detection , 2021, NLDB.

[117]  R. Lambiotte,et al.  DEBAGREEMENT: A comment-reply dataset for (dis)agreement detection in online debates , 2021, NeurIPS Datasets and Benchmarks.

[118]  Stefano Mizzaro,et al.  Twitter goes to the Doctor: Detecting Medical Tweets using Machine Learning and BERT , 2020, SIIRH@ECIR.

[119]  Paolo Rosso,et al.  SardiStance @ EVALITA2020: Overview of the Task on Stance Detection in Italian Tweets , 2020, EVALITA.

[120]  Yang Yang,et al.  A Survey on Opinion Mining: From Stance to Product Aspect , 2019, IEEE Access.

[121]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[122]  Iryna Gurevych,et al.  Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2018, ACL 2018.

[123]  Mohammad Taher Pilehvar,et al.  Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles , 2018, Proceedings of the First Workshop on Fact Extraction and VERification (FEVER).

[124]  Mirko Lai,et al.  iTACOS at IberEval2017: Detecting Stance in Catalan and Spanish Tweets , 2017, IberEval@SEPLN.

[125]  Paolo Rosso,et al.  Overview of the Task on Stance and Gender Detection in Tweets on Catalan Independence , 2017, IberEval@SEPLN.

[126]  Ladislav Lenc,et al.  Detecting Stance in Czech News Commentaries , 2017, ITAT.

[127]  Ani Nenkova,et al.  Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, NAACL 2016.

[128]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[129]  Noel Carroll,et al.  In Search We Trust: Exploring How Search Engines are Shaping Society , 2014, Int. J. Knowl. Soc. Res..

[130]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .