Serendip : Turning Topics Back to the Text

Statistical topic modeling is an increasingly popular approach to text analysis. Many existing visualization tools focus on analyzing the model itself, distinct from the documents upon which it was trained. In contrast, we seek to treat the model as a lens through which to view the original documents. This would enable the reader to observe trends and build hypotheses at multiple scales—ranging from across a corpus to within a single text—and find both algorithmic data and textual examples to defend these hypotheses. Supporting this workflow requires a multi-tiered framework that affords comparisons at three levels: the entire corpus, small sets of documents, and a single document. We provide such a tool in our implementation of Serendip, a web-application that combines view-coordinated reorderable matrices, small multiples displays, and tagged text in order to allow readers to develop insight at multiple levels and carry that insight into their analysis of the others.