ThreadReconstructor: Modeling Reply‐Chains to Untangle Conversational Text through Visual Analytics

We present ThreadReconstructor, a visual analytics approach for detecting and analyzing the implicit conversational structure of discussions, e.g., in political debates and forums. Our work is motivated by the need to reveal and understand single threads in massive online conversations and verbatim text transcripts. We combine supervised and unsupervised machine learning models to generate a basic structure that is enriched by user‐defined queries and rule‐based heuristics. Depending on the data and tasks, users can modify and create various reconstruction models that are presented and compared in the visualization interface. Our tool enables the exploration of the generated threaded structures and the analysis of the untangled reply‐chains, comparing different models and their agreement. To understand the inner‐workings of the models, we visualize their decision spaces, including all considered candidate relations. In addition to a quantitative evaluation, we report qualitative feedback from an expert user study with four forum moderators and one machine learning expert, showing the effectiveness of our approach.

[1]  Marko Grobelnik,et al.  Visualization of Online Discussion Forums , 2010, WAPA.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Daniel A. Keim,et al.  ConToVi: Multi‐Party Conversation Exploration using Topic‐Space Views , 2016, Comput. Graph. Forum.

[4]  Jian Zhao,et al.  Visual Analysis of MOOC Forums with iForum , 2017, IEEE Transactions on Visualization and Computer Graphics.

[5]  Ben Shneiderman,et al.  Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights , 2010, AMT.

[6]  Li Wang,et al.  Predicting Thread Discourse Structure over Technical Web Forums , 2011, EMNLP.

[7]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[8]  Daniel A. Keim,et al.  NEREx: Named‐Entity Relationship Exploration in Multi‐Party Conversations , 2017, Comput. Graph. Forum.

[9]  Heshaam Faili,et al.  A Supervised Approach for Reconstructing Thread Structure in Comments on Blogs and Online News Agencies (El enfoque supervisado para reconstrucción de la estructura de hilos en comentarios en blogs y agencias de noticias en línea) , 2013, Computación y Sistemas.

[10]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[11]  Yang Chen Visual Opinion Analysis of Threaded Discussions , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[12]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[13]  Martin Wattenberg,et al.  Flash forums and forumReader: navigating a new kind of large-scale online discussion , 2004, CSCW.

[14]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[15]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[16]  Maarten de Rijke,et al.  Extracting the discussion structure in comments on news-articles , 2007, WIDM '07.

[17]  Brian D. Fisher,et al.  Field experiment methodology for pair analytics , 2014, BELIV.

[18]  Carolyn Penstein Rosé,et al.  Recovering Implicit Thread Structure in Newsgroup Style Conversations , 2021, ICWSM.

[19]  Daniel Weiskopf,et al.  Visualizing Fuzzy Overlapping Communities in Networks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[20]  Elmar Eisemann,et al.  Computational Light Painting Using a Virtual Exposure , 2017, Comput. Graph. Forum.

[21]  Chen Lin,et al.  Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications , 2009, SIGIR.

[22]  Bernard Kerr Thread Arcs: an email thread visualization , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[23]  Daniel A. Keim,et al.  ForAVis: explorative user forum analysis , 2011, WIMS '11.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[26]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[27]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[28]  Jian Pei,et al.  Online Visual Analytics of Text Streams , 2015, IEEE Transactions on Visualization and Computer Graphics.

[29]  Heeyoung Lee,et al.  Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[30]  Yindalon Aphinyanagphongs,et al.  A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[31]  Daniel A. Keim,et al.  Progressive Learning of Topic Modeling Parameters: A Visual Analytics Framework , 2018, IEEE Transactions on Visualization and Computer Graphics.

[32]  Daniel A. Keim,et al.  Interactive Visual Analysis of Transcribed Multi-Party Discourse , 2017, ACL.

[33]  Himani Sharma,et al.  A Survey on Decision Tree Algorithms of Classification in Data Mining , 2016 .

[34]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[35]  Yi Chen,et al.  Learning thread reply structure on patient forums , 2013, DARE '13.

[36]  Erik Aumayr,et al.  Reconstruction of Threaded Conversations in Online Discussion Forums , 2011, ICWSM.

[37]  Giuseppe Carenini,et al.  ConVis: A Visual Text Analytic System for Exploring Blog Conversations , 2014, Comput. Graph. Forum.