ThemeDelta: Dynamic Segmentations over Temporal Topic Models

We present ThemeDelta, a visual analytics system for extracting and visualizing temporal trends, clustering, and reorganization in time-indexed textual datasets. ThemeDelta is supported by a dynamic temporal segmentation algorithm that integrates with topic modeling algorithms to identify change points where significant shifts in topics occur. This algorithm detects not only the clustering and associations of keywords in a time period, but also their convergence into topics (groups of keywords) that may later diverge into new groups. The visual representation of ThemeDelta uses sinuous, variable-width lines to show this evolution on a timeline, utilizing color for categories, and line width for keyword strength. We demonstrate how interaction with ThemeDelta helps capture the rise and fall of topics by analyzing archives of historical newspapers, of U.S. presidential campaign speeches, and of social messages collected through iNeighbors, a web-based social website. ThemeDelta is evaluated using a qualitative expert user study involving three researchers from rhetoric and history using the historical newspapers corpus.

[1]  Naren Ramakrishnan,et al.  Bridging the Divide in Democratic Engagement: Studying Conversation Patterns in Advantaged and Disadvantaged Communities , 2012, 2012 International Conference on Social Informatics.

[2]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[3]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[4]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[5]  Haixun Wang,et al.  Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes , 2011, 2011 IEEE 11th International Conference on Data Mining.

[6]  Martin Wattenberg,et al.  Visual exploration of multivariate graphs , 2006, CHI.

[7]  Niklas Elmqvist,et al.  Causality visualization using animated growing polygons , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[8]  Jeffrey Heer,et al.  Tracing genealogical data with TimeNets , 2010, AVI.

[9]  Ben Shneiderman,et al.  Discovering interesting usage patterns in text collections: integrating text mining with visualization , 2007, CIKM '07.

[10]  William Ribarsky,et al.  NewsLab: Exploratory Broadcast News Video Analysis , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[11]  David S. Ebert,et al.  WordBridge: Using Composite Tag Clouds in Node-Link Diagrams for Visualizing Content and Relations in Text Corpora , 2011, 2011 44th Hawaii International Conference on System Sciences.

[12]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[13]  Pak Chung Wong,et al.  Dynamic visualization of graphs with extended labels , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Daniel A. Keim,et al.  Knowledge Generation Model for Visual Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[16]  Melanie Tory,et al.  Evaluating Visualizations: Do Expert Reviews Work? , 2005, IEEE Computer Graphics and Applications.

[17]  Martin Wattenberg,et al.  Stacked Graphs – Geometry & Aesthetics , 2008, IEEE Transactions on Visualization and Computer Graphics.

[18]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[19]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[20]  Mengchen Liu,et al.  StoryFlow: Tracking the Evolution of Stories , 2013, IEEE Transactions on Visualization and Computer Graphics.

[21]  William Ribarsky,et al.  ParallelTopics: A probabilistic approach to exploring document collections , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[22]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[23]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[24]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[25]  Bongshin Lee,et al.  ManiWordle: Providing Flexible Control over Wordle , 2010, IEEE Transactions on Visualization and Computer Graphics.

[26]  Christopher Andrews,et al.  The human is the loop: new directions for visual analytics , 2014, Journal of Intelligent Information Systems.

[27]  Kwan-Liu Ma,et al.  Design Considerations for Optimizing Storyline Visualizations , 2012, IEEE Transactions on Visualization and Computer Graphics.

[28]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[29]  Martin Wattenberg,et al.  Participatory Visualization with Wordle , 2009, IEEE Transactions on Visualization and Computer Graphics.

[30]  Edward Rolf Tufte,et al.  The visual display of quantitative information , 1985 .

[31]  David S. Ebert,et al.  Evaluating the Role of Time in Investigative Analysis of Document Collections , 2012, IEEE Transactions on Visualization and Computer Graphics.

[32]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[33]  Baining Guo,et al.  Mining evolutionary multi-branch trees from text streams , 2013, KDD.

[34]  Steffen Lohmann,et al.  Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration , 2009, INTERACT.

[35]  Helwig Hauser,et al.  Interactive Visual Analysis of Temporal Cluster Structures , 2011, Comput. Graph. Forum.

[36]  Martin Wattenberg,et al.  Mapping Text with Phrase Nets , 2009, IEEE Transactions on Visualization and Computer Graphics.

[37]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[38]  Yusef Hassan-Montero,et al.  Improving Tag-Clouds as Visual Information Retrieval Interfaces , 2024, 2401.04947.

[39]  Daniela Karin Rosner,et al.  Tag Clouds: Data Analysis Tool or Social Signaller? , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[40]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[41]  Martin Wattenberg,et al.  Parallel Tag Clouds to explore and analyze faceted text corpora , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[42]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[43]  M. Shahriar Hossain,et al.  How to “alternatize” a clustering algorithm , 2013, Data Mining and Knowledge Discovery.

[44]  Kai Zhang,et al.  Mining common topics from multiple asynchronous text streams , 2009, WSDM '09.

[45]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.