Online Visual Analytics of Text Streams

We present an online visual analytics approach to helping users explore and understand hierarchical topic evolution in high-volume text streams. The key idea behind this approach is to identify representative topics in incoming documents and align them with the existing representative topics that they immediately follow (in time). To this end, we learn a set of streaming tree cuts from topic trees based on user-selected focus nodes. A dynamic Bayesian network model has been developed to derive the tree cuts in the incoming topic trees to balance the fitness of each tree cut and the smoothness between adjacent tree cuts. By connecting the corresponding topics at different times, we are able to provide an overview of the evolving hierarchical topics. A sedimentation-based visualization has been designed to enable the interactive analysis of streaming text data from global patterns to local details. We evaluated our method on real-world datasets and the results are generally favorable.

[1]  Daniel A. Keim,et al.  EventRiver: Visually Exploring Text Collections with Temporal References , 2012, IEEE Transactions on Visualization and Computer Graphics.

[2]  Weiwei Cui,et al.  How Hierarchical Topics Evolve in Large Text Corpora , 2014, IEEE Transactions on Visualization and Computer Graphics.

[3]  Shimei Pan,et al.  TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis , 2012, TIST.

[4]  G. Nahler carry-over effect , 2009 .

[5]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Hong Zhou,et al.  Geometry-Based Edge Clustering for Graph Visualization , 2008, IEEE Transactions on Visualization and Computer Graphics.

[7]  William Ribarsky,et al.  ParallelTopics: A probabilistic approach to exploring document collections , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[8]  William Ribarsky,et al.  HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies , 2013, IEEE Transactions on Visualization and Computer Graphics.

[9]  Ben Shneiderman,et al.  LifeFlow: visualizing an overview of event sequences , 2011, CHI.

[10]  Jean-Daniel Fekete,et al.  Visual Sedimentation , 2013, IEEE Transactions on Visualization and Computer Graphics.

[11]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[12]  Yee Whye Teh,et al.  Bayesian Rose Trees , 2010, UAI.

[13]  Yingcai Wu,et al.  Visual Analysis of Topic Competition on Social Media , 2013, IEEE Transactions on Visualization and Computer Graphics.

[14]  Yale Song,et al.  #FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media , 2014, IEEE Transactions on Visualization and Computer Graphics.

[15]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[16]  Yingcai Wu,et al.  EvoRiver: Visual Analysis of Topic Coopetition on Social Media , 2014, IEEE Transactions on Visualization and Computer Graphics.

[17]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[18]  Jiawei Han,et al.  Topic modeling for OLAP on multidimensional text databases: topic cube and its applications , 2009, Stat. Anal. Data Min..

[19]  Haixun Wang,et al.  Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[21]  G. W. Furnas,et al.  Generalized fisheye views , 1986, CHI '86.

[22]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[23]  Philip S. Yu,et al.  Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[24]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[25]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[26]  Mengchen Liu,et al.  A survey on information visualization: recent advances and challenges , 2014, The Visual Computer.

[27]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[28]  Yunxin Zhao,et al.  Fast model selection based speaker adaptation for nonnative speech , 2003, IEEE Trans. Speech Audio Process..

[29]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  M. McCombs Agenda setting function of mass media , 1977 .

[31]  Yee Whye Teh,et al.  Discovering Nonbinary Hierarchical Structures with Bayesian Rose Trees , 2011 .

[32]  김종덕,et al.  Interactive. , 1996, Nursing older people.

[33]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[34]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[35]  Kai Zhang,et al.  Mining common topics from multiple asynchronous text streams , 2009, WSDM '09.

[36]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[37]  Hongan Wang,et al.  Visualization of large hierarchical data by circle packing , 2006, CHI.

[38]  Yingcai Wu,et al.  A Survey of Visual Analytics Techniques and Applications: State-of-the-Art Research and Future Challenges , 2013, Journal of Computer Science and Technology.

[39]  David Gotz,et al.  Exploring Flow, Factors, and Outcomes of Temporal Event Sequences with the Outflow Visualization , 2012, IEEE Transactions on Visualization and Computer Graphics.

[40]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[41]  Daniel A. Keim,et al.  Story Tracker: Incremental visual text analytics of news story development , 2013, Inf. Vis..

[42]  Baining Guo,et al.  Mining evolutionary multi-branch trees from text streams , 2013, KDD.

[43]  William Ribarsky,et al.  LeadLine: Interactive visual analysis of text data through event identification and exploration , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[44]  David D. Woods,et al.  Visual Momentum: A Concept to Improve the Cognitive Coupling of Person and Computer , 1984, Int. J. Man Mach. Stud..

[45]  Shimei Pan,et al.  Interactive, topic-based visual text summarization and analysis , 2009, CIKM.

[46]  Baining Guo,et al.  Evolutionary Bayesian Rose Trees , 2015, IEEE Transactions on Knowledge and Data Engineering.