Statistical topic modeling is an increasingly popular approach to text analysis. Many existing visualization tools focus on analyzing the model itself, distinct from the documents upon which it was trained. In contrast, we seek to treat the model as a lens through which to view the original documents. This would enable the reader to observe trends and build hypotheses at multiple scales—ranging from across a corpus to within a single text—and find both algorithmic data and textual examples to defend these hypotheses. Supporting this workflow requires a multi-tiered framework that affords comparisons at three levels: the entire corpus, small sets of documents, and a single document. We provide such a tool in our implementation of Serendip, a web-application that combines view-coordinated reorderable matrices, small multiples displays, and tagged text in order to allow readers to develop insight at multiple levels and carry that insight into their analysis of the others.
[1]
Harri Siirtola,et al.
Interaction with the Reorderable Matrix
,
1999,
1999 IEEE International Conference on Information Visualization (Cat. No. PR00210).
[2]
Franco Moretti.
Graphs, Maps, Trees: Abstract Models for a Literary History
,
2005
.
[3]
Natale Stucchi,et al.
On the Portability of Computer-Generated Presentations: The Effect of Text-Background Color Combinations on Text Legibility
,
2008,
Hum. Factors.
[4]
Shimei Pan,et al.
Interactive, topic-based visual text summarization and analysis
,
2009,
CIKM.
[5]
Jacques Bertin,et al.
Semiology of Graphics - Diagrams, Networks, Maps
,
2010
.
[6]
Jeffrey Heer,et al.
Interpretation and trust: designing model-driven visualizations for text analysis
,
2012,
CHI.