Hiearchie: Visualization for Hierarchical Topic Models

Existing algorithms for understanding large collections of documents often produce output that is nearly as difficult and time consuming to interpret as reading each of the documents themselves. Topic modeling is a text understanding algorithm that discovers the “topics” or themes within a collection of documents. Tools based on topic modeling become increasingly complex as the number of topics required to best represent the collection increases. In this work, we present Hiérarchie, an interactive visualization that adds structure to large topic models, making them approachable and useful to an end user. Additionally, we demonstrate Hiérarchie’s ability to analyze a diverse document set regarding a trending news topic.

[1]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[2]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  John T. Stasko,et al.  An evaluation of space-filling information visualizations for depicting hierarchical structures , 2000, Int. J. Hum. Comput. Stud..

[4]  Matt Gardner The Topic Browser An Interactive Tool for Browsing Topic Models , 2010 .

[5]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[6]  J. B. Kruskal,et al.  Icicle Plots: Better Displays for Hierarchical Clustering , 1983 .

[7]  Aniket Kittur,et al.  TopicViz: interactive topic exploration in document collections , 2012, CHI Extended Abstracts.

[8]  Ben Shneiderman,et al.  Visual Analysis of Topical Evolution in Unstructured Text: Design and Evaluation of TopicFlow , 2015, Applications of Social Media and Social Network Analysis.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Peter Mindek,et al.  Contextual Snapshots: Enriched Visualization with Interactive Spatial Annotations , 2013, SCCG.

[11]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[12]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[13]  Sara Irina Fabrikant,et al.  Cognitively Plausible Information Visualization , 2005 .

[14]  Andrew McCallum,et al.  Organizing the OCA: learning faceted subjects from a library of digital books , 2007, JCDL '07.

[15]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.