Incremental visual text analytics of news story development

Online news sources produce thousands of news articles every day, reporting on local and global real-world events. New information quickly replaces the old, making it difficult for readers to put current events in the context of the past. Additionally, the stories have very complex relationships and characteristics that are difficult to model: they can be weakly or strongly connected, or they can merge or split over time. In this paper, we present a visual analytics system for exploration of news topics in dynamic information streams, which combines interactive visualization and text mining techniques to facilitate the analysis of similar topics that split and merge over time. We employ text clustering techniques to automatically extract stories from online news streams and present a visualization that: 1) shows temporal characteristics of stories in different time frames with different level of detail; 2) allows incremental updates of the display without recalculating the visual features of the past data; 3) sorts the stories by minimizing clutter and overlap from edge crossings. By using interaction, stories can be filtered based on their duration and characteristics in order to be explored in full detail with details on demand. To demonstrate the usefulness of our system, case studies with real news data are presented and show the capabilities for detailed dynamic text stream exploration.

[1]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[2]  Stuart J. Rose,et al.  Describing story evolution from dynamic information streams , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[3]  Dawid Weiss,et al.  Carrot2: Design of a Flexible and Efficient Web Information Retrieval Framework , 2005, AWIC.

[4]  Daniel A. Keim,et al.  EventRiver: Visually Exploring Text Collections with Temporal References , 2012, IEEE Transactions on Visualization and Computer Graphics.

[5]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Dawid Weiss,et al.  Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition , 2004, Intelligent Information Systems.

[7]  George G. Robertson,et al.  Narratives: A visualization to track narrative events as they develop , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[8]  R. Kosara,et al.  Parallel sets: visual analysis of categorical data , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[9]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[10]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[11]  Matthew O. Ward,et al.  Mapping Nominal Values to Numbers for Effective Visualization , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[12]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[13]  Daniel A. Keim,et al.  Processing online news streams for large-scale semantic analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[14]  M. Sheelagh T. Carpendale,et al.  A Visual Backchannel for Large-Scale Events , 2010, IEEE Transactions on Visualization and Computer Graphics.

[15]  Daniel A. Keim,et al.  Visual analysis of news streams with article threads , 2010, StreamKDD '10.

[16]  Pak Chung Wong,et al.  Dynamic visualization of transient data streams , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[17]  Myra Spiliopoulou,et al.  Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques , 2010, KDD 2010.

[18]  Lucy T. Nowell,et al.  ThemeRiver: visualizing theme changes over time , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[19]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[20]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[21]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[22]  Martin Wattenberg,et al.  Stacked Graphs – Geometry & Aesthetics , 2008, IEEE Transactions on Visualization and Computer Graphics.

[23]  Jeffrey Heer,et al.  Sizing the horizon: the effects of chart size and layering on the graphical perception of time series visualizations , 2009, CHI.

[24]  Dawid Weiss,et al.  Carrot and Language Properties in Web Search Results Clustering , 2003, AWIC.

[25]  Heidrun Schumann,et al.  Visualization of Time-Oriented Data , 2011, Human-Computer Interaction Series.

[26]  Erik Van der Goot,et al.  Near real time information mining in multilingual news , 2009, WWW '09.

[27]  Daniel A. Keim,et al.  CloudLines: Compact Display of Event Episodes in Multiple Time-Series , 2011, IEEE Transactions on Visualization and Computer Graphics.

[28]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .