LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation

We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.

[1]  Doug Downey,et al.  Active Learning with Constrained Topic Model , 2014, ACL 2014.

[2]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Weiwei Cui,et al.  How Hierarchical Topics Evolve in Large Text Corpora , 2014, IEEE Transactions on Visualization and Computer Graphics.

[5]  A. Adithya Parallel Coordinates , 2015 .

[6]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[7]  William Ribarsky,et al.  ParallelTopics: A probabilistic approach to exploring document collections , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[8]  Danny Holten,et al.  Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[9]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[10]  Jacques Bertin,et al.  Semiology of Graphics - Diagrams, Networks, Maps , 2010 .

[11]  Michael Gleicher,et al.  Serendip : Turning Topics Back to the Text , 2013 .

[12]  Tamara Munzner,et al.  Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists , 2014, IEEE Transactions on Visualization and Computer Graphics.

[13]  Fei Wang,et al.  Optimizing temporal topic segmentation for intelligent text visualization , 2013, IUI '13.

[14]  Thomas Ertl,et al.  VarifocalReader — In-Depth Visual Analysis of Large Text Documents , 2014, IEEE Transactions on Visualization and Computer Graphics.

[15]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[16]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[17]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[18]  Naren Ramakrishnan,et al.  ThemeDelta: Dynamic Segmentations over Temporal Topic Models , 2015, IEEE Transactions on Visualization and Computer Graphics.