Visualizing Temporal-Thematic Patterns in Text Collections

Visualizing the temporal evolution of texts is relevant for many domains that seek to gain insight from text repositories. However, existing visualization methods for text collections do not show fine-grained temporal-thematic patterns. Therefore, we developed and analyzed a new visualization method that aims at uncovering such patterns. Specifically, we project texts to one dimension, which allows positioning texts in a 2D diagram of projection space and time. For projection, we employed two manifold learning algorithms: the self-organizing map (SOM) and UMAP. To assess the utility of our method, we experimented with real-world datasets and discuss the resulting visualizations. We find our method facilitates relating patterns and extracting associated texts beyond what is possible with previous techniques. We also conducted interviews with historians to show that our prototypical system supports domain experts in their analysis tasks. CCS Concepts • Applied computing → Document searching; • Information systems → Search interfaces; • Human-centered computing → Visualization techniques;

[1]  Arjan Kuijper,et al.  Visual Interactive Creation and Validation of Text Clustering Workflows to Explore Document Collections , 2017, Visualization and Data Analysis.

[2]  J. LaFountain Inc. , 2013, American Art.

[3]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[4]  Martin Wattenberg,et al.  Parallel Tag Clouds to explore and analyze faceted text corpora , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[5]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[6]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[7]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[8]  Daniel A. Keim,et al.  Temporal MDS Plots for Analysis of Multivariate Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  atherine,et al.  Finding the number of clusters in a data set : An information theoretic approach C , 2003 .

[11]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[12]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[13]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[14]  Pierre Dragicevic,et al.  Time Curves: Folding Time to Visualize Patterns of Temporal Evolution in Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[15]  Kenneth Moreland,et al.  Why We Use Bad Color Maps and What You Can Do About It , 2016, HVEI.

[16]  Ulrik Brandes,et al.  MotionRugs: Visualizing Collective Trends in Space and Time , 2019, IEEE Transactions on Visualization and Computer Graphics.

[17]  Daniel A. Keim,et al.  Designing Pixel-Oriented Visualization Techniques: Theory and Applications , 2000, IEEE Trans. Vis. Comput. Graph..

[18]  Daniel A. Keim,et al.  EventRiver: Visually Exploring Text Collections with Temporal References , 2012, IEEE Transactions on Visualization and Computer Graphics.

[19]  Peter Sarlin,et al.  Self-organizing time map: An abstraction of temporal multivariate patterns , 2012, Neurocomputing.

[20]  William Ribarsky,et al.  HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies , 2013, IEEE Transactions on Visualization and Computer Graphics.

[21]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[22]  Thomas Ertl,et al.  PyramidTags: Context-, Time- and Word Order-Aware Tag Maps to Explore Large Document Collections , 2020, IEEE Transactions on Visualization and Computer Graphics.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[25]  Tobias Isenberg,et al.  A Systematic Review on the Practice of Evaluating Visualization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[26]  Teuvo Kohonen Self-organizing maps of massive document collections , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[27]  Tobias Schreck,et al.  Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[28]  Matthew J. Kyan,et al.  Self-Organizing Maps for Topic Trend Discovery , 2010, IEEE Signal Processing Letters.

[29]  Daniel A. Keim,et al.  Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution , 2019, IEEE Transactions on Visualization and Computer Graphics.

[30]  Martin Wattenberg,et al.  The Word Tree, an Interactive Visual Concordance , 2008, IEEE Transactions on Visualization and Computer Graphics.

[31]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  W. Cleveland,et al.  Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods , 1984 .

[34]  John T. Stasko,et al.  The information mural: a technique for displaying and navigating large information spaces , 1995, Proceedings of Visualization 1995 Conference.