TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections

Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.

[1]  Jimeng Sun,et al.  FacetAtlas: Multifaceted Visualization for Rich Text Corpora , 2010, IEEE Transactions on Visualization and Computer Graphics.

[2]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[3]  Padhraic Smyth,et al.  TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling , 2012, TIST.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[6]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[7]  Olivier Chapuis,et al.  High-precision magnification lenses , 2010, CHI.

[8]  Naonori Ueda,et al.  Probabilistic latent semantic visualization: topic model for visualizing documents , 2008, KDD.

[9]  Colin Ware,et al.  The DragMag image magnifier , 1995, CHI 95 Conference Companion.

[10]  M. Sheelagh T. Carpendale,et al.  A framework for unifying presentation space , 2001, UIST '01.

[11]  Pak Chung Wong,et al.  Dynamic visualization of transient data streams , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Heidrun Schumann,et al.  A Survey on Interactive Lenses in Visualization , 2014, EuroVis.

[14]  Jaegul Choo,et al.  Customizing Computational Methods for Visual Analytics with Big Data , 2013, IEEE Computer Graphics and Applications.

[15]  Heidrun Schumann,et al.  Fisheye Tree Views and Lenses for Graph Visualization , 2006, Tenth International Conference on Information Visualisation (IV'06).

[16]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  David Gotz,et al.  Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[19]  Tony DeRose,et al.  Toolglass and magic lenses: the see-through interface , 1993, SIGGRAPH.

[20]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[21]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22]  Haesun Park,et al.  Fast rank-2 nonnegative matrix factorization for hierarchical document clustering , 2013, KDD.

[23]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[24]  Benjamin B. Bederson,et al.  A review of overview+detail, zooming, and focus+context interfaces , 2009, CSUR.

[25]  G. W. Furnas,et al.  Generalized fisheye views , 1986, CHI '86.

[26]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[27]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[28]  Jaegul Choo,et al.  PIVE: Per-Iteration visualization environment for supporting real-time interactions with computational methods , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[29]  Emmanuel Pietriga,et al.  Sigma lenses: focus-context transitions combining space, time and translucence , 2008, CHI.

[30]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[31]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[32]  Jarke J. van Wijk,et al.  Interactive Visualization of Small World Graphs , 2004, IEEE Symposium on Information Visualization.

[33]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[34]  Maxime Dumas,et al.  VectorLens: Angular Selection of Curves within 2D Dense Visualizations , 2015, IEEE Transactions on Visualization and Computer Graphics.

[35]  Niklas Elmqvist,et al.  Polyzoom: multiscale and multifocus exploration in 2d visual spaces , 2012, CHI.

[36]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  M. Sheelagh T. Carpendale,et al.  Edgelens: an interactive method for managing edge congestion in graphs , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[39]  Jock D. Mackinlay,et al.  The document lens , 1993, UIST '93.

[40]  Jean-Daniel Fekete,et al.  Excentric Labeling: Dynamic Neighborhood Labeling for Data Visualization , 2003 .

[41]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[42]  Ramana Rao,et al.  The table lens: merging graphical and symbolic representations in an interactive focus + context visualization for tabular information , 1994, CHI '94.

[43]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[44]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[45]  Baining Guo,et al.  TopicPanorama: A Full Picture of Relevant Topics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[46]  Pierre Dragicevic,et al.  Color Lens: Adaptive Color Scale Optimization for Visual Exploration , 2011, IEEE Transactions on Visualization and Computer Graphics.

[47]  Joshua B. Tenenbaum,et al.  Sparse multidimensional scaling using land-mark points , 2004 .

[48]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[49]  Manojit Sarkar,et al.  Graphical fisheye views of graphs , 1992, CHI.

[50]  Mark D. Apperley,et al.  A review and taxonomy of distortion-oriented presentation techniques , 1994, TCHI.

[51]  Jeffrey Heer,et al.  Replication of the Keyword Extraction part of the paper "'Without the Clutter of Unimportant Words': Descriptive Keyphrases for Text Visualization" , 2019, ArXiv.