Topic Modeling of Document Metadata for Visualizing Collaborations over Time

We describe methods for analyzing and visualizing document metadata to provide insights about collaborations over time. We investigate the use of Latent Dirichlet Allocation (LDA) based topic modeling to compute areas of interest on which people collaborate. The topics are represented in a node-link force directed graph by persistent fixed nodes laid out with multidimensional scaling (MDS), and the people by transient movable nodes. The topics are also analyzed to detect bursts to highlight "hot" topics during a time interval. As the user manipulates a time interval slider, the people nodes and links are dynamically updated. We evaluate the results of LDA topic modeling for the visualization by comparing topic keywords against the submitted keywords from the InfoVis 2004 Contest, and we found that the additional terms provided by LDA-based keyword sets result in improved similarity between a topic keyword set and the documents in a corpus. We extended the InfoVis dataset from 8 to 20 years and collected publication metadata from our lab over a period of 21 years, and created interactive visualizations for exploring these larger datasets.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Weimao Ke,et al.  Major Information Visualization Authors, Papers and Topics in the ACM Library , 2004, IEEE Symposium on Information Visualization.

[3]  Holly Arrow,et al.  Time, Change, and Development , 2004 .

[4]  Daniel A. Keim,et al.  Exploring and Visualizing the History of InfoVis , 2004 .

[5]  Kenneth Y. Goldberg,et al.  Opinion space: a scalable tool for browsing online comments , 2010, CHI.

[6]  Jian Zhao,et al.  Interactive Exploration of Implicit and Explicit Relations in Faceted Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[7]  Patrick Chiu,et al.  Collaboration Map: Visualizing Temporal Dynamics of Small Group Collaboration , 2015, CSCW Companion.

[8]  Michal Jacovi,et al.  The chasms of CSCW: a citation graph analysis of the CSCW conference , 2006, CSCW '06.

[9]  Kwan-Liu Ma,et al.  One-For-All: Visualization of the Information Visualization Symposia , 2004, IEEE Symposium on Information Visualization.

[10]  Peter Kraker,et al.  Altmetrics-based Visualizations Depicting the Evolution of a Knowledge Domain , 2014 .

[11]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[12]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[13]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[14]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[15]  Tim Dwyer,et al.  WilmaScope - A 3D Graph Visualization System , 2004, Graph Drawing Software.

[16]  Mary Czerwinski,et al.  Understanding research trends in conferences using paperLens , 2005, CHI Extended Abstracts.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Daniel Barbará,et al.  Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.

[19]  Le Song,et al.  WilmaScope Graph Visualisation , 2004 .