TextDNA: Visualizing Word Usage with Configurable Colorfields

Patterns of words used in different text collections can characterize interesting properties of a corpus. However, these patterns are challenging to explore as they often involve complex relationships across many words and collections in a large space of words. In this paper, we propose a configurable colorfield design to aid this exploration. Our approach uses a dense colorfield overview to present large amounts of data in ways that make patterns perceptible. It allows flexible configuration of both data mappings and aggregations to expose different kinds of patterns, and provides interactions to help connect detailed patterns to the corpus overview. TextDNA, our prototype implementation, leverages the GPU to provide interactivity in the web browser even on large corpora. We present five case studies showing how the tool supports inquiry in corpora ranging in size from single document to millions of books. Our work shows how to make a configurable colorfield approach practical for a range of analytic tasks.

[1]  Robert S. Laramee,et al.  ShakerVis: Visual analysis of segment variation of German translations of Shakespeare’s Othello , 2015, Inf. Vis..

[2]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[3]  John T. Stasko,et al.  Combining Computational Analyses and Interactive Visualization for Document Exploration and Sensemaking in Jigsaw , 2013, IEEE Transactions on Visualization and Computer Graphics.

[4]  Martin Wattenberg,et al.  TIMELINESTag clouds and the case for vernacular visualization , 2008, INTR.

[5]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[6]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[7]  W. Bradford Paley,et al.  TextArc: Showing Word Frequency and Distribution in Text , 2002 .

[8]  Daniel A. Keim,et al.  Pixel-Oriented Visualization Techniques for Exploring Very Large Data Bases , 1996 .

[9]  Michael Gleicher,et al.  Serendip: Topic model-driven visual exploration of text corpora , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[10]  Magdalena Jankowska,et al.  Relative N-gram signatures: Document visualization at the level of character N-grams , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[11]  M. Sheelagh T. Carpendale,et al.  DocuBurst: Visualizing Document Content using Language Structure , 2009, Comput. Graph. Forum.

[12]  M. Sheelagh T. Carpendale,et al.  SparkClouds: Visualizing Trends in Tag Clouds , 2010, IEEE Transactions on Visualization and Computer Graphics.

[13]  Nancy Argüelles,et al.  Author ' s , 2008 .

[14]  H. Haggard She: A History of Adventure , 1887 .

[15]  Michael Gleicher,et al.  Task-driven evaluation of aggregation in time series visualization , 2014, CHI.

[16]  Steven Franconeri,et al.  Comparing averages in time series data , 2012, CHI.

[17]  Jean-Daniel Fekete,et al.  Design Considerations for Enhancing Word-Scale Visualizations with Interaction , 2015 .

[18]  Daniel A. Keim,et al.  Integrated visual analysis of patterns in time series and text data - Workflow and application to financial data analysis , 2016, Inf. Vis..

[19]  Martin Wattenberg,et al.  The Word Tree, an Interactive Visual Concordance , 2008, IEEE Transactions on Visualization and Computer Graphics.

[20]  Michael Gleicher,et al.  Exploring Collections of Tagged Text for Literary Scholarship , 2011, Comput. Graph. Forum.

[21]  Evelyn J.Hinz RIDER HAGGARD'S SHE: AN ARCHETYPAL "HISTORY OF ADVENTURE" , 2016 .

[22]  Daniel A. Keim,et al.  CloudLines: Compact Display of Event Episodes in Multiple Time-Series , 2011, IEEE Transactions on Visualization and Computer Graphics.

[23]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[24]  Daniel A. Keim,et al.  Visual Boosting in Pixel‐based Visualizations , 2011, Comput. Graph. Forum.

[25]  Greta Franzini,et al.  On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges , 2015, EuroVis.

[26]  Michael Gleicher,et al.  Sequence Surveyor: Leveraging Overview for Scalable Genomic Alignment Visualization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[27]  Silvia Miksch,et al.  Visualizing Sets and Set-typed Data: State-of-the-Art and Future Challenges , 2014, EuroVis.

[28]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[29]  Hanspeter Pfister,et al.  UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[30]  Martin Wattenberg,et al.  Parallel Tag Clouds to explore and analyze faceted text corpora , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[31]  John T. Stasko,et al.  OnSet: A Visualization Technique for Large-scale Binary Set Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[32]  Daniel A. Keim,et al.  Literature Fingerprinting: A New Method for Visual Literary Analysis , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[33]  Ben Shneiderman,et al.  Discovering interesting usage patterns in text collections: integrating text mining with visualization , 2007, CIKM '07.