Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution

To effectively assess the potential consequences of human interventions in model-driven analytics systems, we establish the concept of speculative execution as a visual analytics paradigm for creating user-steerable preview mechanisms. This paper presents an explainable, mixed-initiative topic modeling framework that integrates speculative execution into the algorithmic decision-making process. Our approach visualizes the model-space of our novel incremental hierarchical topic modeling algorithm, unveiling its inner-workings. We support the active incorporation of the user's domain knowledge in every step through explicit model manipulation interactions. In addition, users can initialize the model with expected topic seeds, the backbone priors. For a more targeted optimization, the modeling process automatically triggers a speculative execution of various optimization strategies, and requests feedback whenever the measured model quality deteriorates. Users compare the proposed optimizations to the current model state and preview their effect on the next model iterations, before applying one of them. This supervised human-in-the-Ioop process targets maximum improvement for minimum feedback and has proven to be effective in three independent studies that confirm topic model quality improvements.

[1]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[2]  William Ribarsky,et al.  ParallelTopics: A probabilistic approach to exploring document collections , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[3]  William Ribarsky,et al.  HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies , 2013, IEEE Transactions on Visualization and Computer Graphics.

[4]  Timothy Baldwin,et al.  Best Topic Word Selection for Topic Labelling , 2010, COLING.

[5]  Quanming Yao,et al.  VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling , 2017, Vis. Informatics.

[6]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[7]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[8]  Daniel Weiskopf,et al.  Visualizing Fuzzy Overlapping Communities in Networks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[9]  Hiroshi G. Okuno,et al.  Parallel execution of OPS5 in QLISP , 1988, [1988] Proceedings. The Fourth Conference on Artificial Intelligence Applications.

[10]  Daniel A. Keim,et al.  Progressive Learning of Topic Modeling Parameters: A Visual Analytics Framework , 2018, IEEE Transactions on Visualization and Computer Graphics.

[11]  Ben Shneiderman,et al.  Visualizing Change over Time Using Dynamic Hierarchies: TreeVersity2 and the StemView , 2013, IEEE Transactions on Visualization and Computer Graphics.

[12]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[13]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[14]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[15]  Niklas Elmqvist,et al.  The human touch: How non-expert users perceive, interpret, and fix topic models , 2017, Int. J. Hum. Comput. Stud..

[16]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[17]  Raghu Krishnapuram,et al.  Fuzzy co-clustering of documents and keywords , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[18]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[19]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[20]  Mary Czerwinski,et al.  CandidTree: visualizing structural uncertainty in similar hierarchies , 2007, Inf. Vis..

[21]  A. Blum,et al.  Clustering via Similarity Functions : Theoretical Foundations and Algorithms ∗ , 2008 .

[22]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[23]  Michael Gleicher,et al.  Considerations for Visualizing Comparison , 2018, IEEE Transactions on Visualization and Computer Graphics.

[24]  Jaegul Choo,et al.  Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering , 2014 .

[25]  Alison Smith,et al.  erarchie: Interactive Visualization for Hierarchical Topic Models , 2014 .

[26]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.

[27]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[28]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[31]  Michael Gleicher,et al.  Serendip: Topic model-driven visual exploration of text corpora , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[32]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[33]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[34]  François Yvon,et al.  Using LDA to detect semantically incoherent documents , 2008, CoNLL.

[35]  Martin Ponweiser,et al.  Latent Dirichlet Allocation in R , 2012 .

[36]  Nicholas Chen,et al.  TreeJuxtaposer : Scalable Tree Comparison using Focus + Context with Guaranteed Visibility , 2006 .

[37]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[38]  Niklas Elmqvist,et al.  TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections , 2017, IEEE Transactions on Visualization and Computer Graphics.

[39]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[40]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[41]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[42]  Mohamed S. Kamel,et al.  Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[43]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[44]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[45]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[46]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[47]  Jeffrey Heer,et al.  Interpretation and trust: designing model-driven visualizations for text analysis , 2012, CHI.

[48]  Giuseppe Carenini,et al.  ConVisIT: Interactive Topic Modeling for Exploring Asynchronous Online Conversations , 2015, IUI.

[49]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[50]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[51]  Greta Franzini,et al.  On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges , 2015, EuroVis.

[52]  Shimei Pan,et al.  LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation , 2015, ArXiv.

[53]  Daniel Weiskopf,et al.  Bubble Treemaps for Uncertainty Visualization , 2018, IEEE Transactions on Visualization and Computer Graphics.

[54]  Sergey I. Nikolenko,et al.  Topic modelling for qualitative studies , 2017, J. Inf. Sci..

[55]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[56]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[57]  David M. Mimno,et al.  Applications of Topic Models , 2017, Found. Trends Inf. Retr..

[58]  Santosh S. Vempala,et al.  A divide-and-merge methodology for clustering , 2005, PODS '05.

[59]  Marc Streit,et al.  Opening the Black Box: Strategies for Increased User Involvement in Existing Algorithm Implementations , 2014, IEEE Transactions on Visualization and Computer Graphics.

[60]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[61]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[62]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[63]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[64]  Aniket Kittur,et al.  TopicViz: interactive topic exploration in document collections , 2012, CHI Extended Abstracts.

[65]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[66]  Holger Stitz,et al.  AVOCADO: Visualization of Workflow–Derived Data Provenance for Reproducible Biomedical Research , 2016, bioRxiv.