Document visualization: an overview of current research

As the number of sources and quantity of document information explodes, efficient and intuitive visualization tools are desperately needed to assist users in understanding the contents and features of a document, while discovering hidden information. This overview introduces fundamental concepts of and designs for document visualization, a number of representative methods in the field, and challenges as well as promising directions of future development. The focus is on explaining the rationale and characteristics of representative document visualization methods for each category. A discussion of the limitations of our classification and a comparison of reviewed methods are presented at the end. This overview also aims to point out theoretical and practical challenges in document visualization. WIREs Comput Stat 2014, 6:19–36. doi: 10.1002/wics.1285 Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.

[1]  Marko Grobelnik,et al.  Learning Sub-structures of Document Semantic Graphs for Document Summarization , 2004 .

[2]  Naonori Ueda,et al.  Probabilistic latent semantic visualization: topic model for visualizing documents , 2008, KDD.

[3]  Ye Zhao,et al.  STREAMIT: Dynamic visualization and interactive exploration of text streams , 2011, 2011 IEEE Pacific Visualization Symposium.

[4]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[5]  Martin Wattenberg,et al.  The Word Tree, an Interactive Visual Concordance , 2008, IEEE Transactions on Visualization and Computer Graphics.

[6]  Weidong Huang,et al.  Beyond time and error: a cognitive approach to the evaluation of graph drawings , 2008, BELIV '08.

[7]  Zheng Lin,et al.  Frame-Sliced Signature Files , 1992, IEEE Trans. Knowl. Data Eng..

[8]  Chris North,et al.  Workshop Report: Information Visualization–Human-Centered Issues in Visual Representation, Interaction, and Evaluation , 2007, Inf. Vis..

[9]  Xiaomin Wu,et al.  A reverse engineering approach to support software maintenance: version control knowledge extraction , 2004, 11th Working Conference on Reverse Engineering.

[10]  Pak Chung Wong,et al.  TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system , 1998 .

[11]  Chris North,et al.  Toward measuring visualization insight , 2006, IEEE Computer Graphics and Applications.

[12]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[13]  Stephan Diehl,et al.  Software Visualization - Visualizing the Structure, Behaviour, and Evolution of Software , 2007 .

[14]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[15]  Martin Wattenberg,et al.  Parallel Tag Clouds to explore and analyze faceted text corpora , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[16]  Shimei Pan,et al.  Interactive, topic-based visual text summarization and analysis , 2009, CIKM.

[17]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[18]  M. Sheelagh T. Carpendale,et al.  DocuBurst: Visualizing Document Content using Language Structure , 2009, Comput. Graph. Forum.

[19]  Bongshin Lee,et al.  ManiWordle: Providing Flexible Control over Wordle , 2010, IEEE Transactions on Visualization and Computer Graphics.

[20]  Lei Shi,et al.  Understanding text corpora with multiple facets , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[21]  Gondy Leroy,et al.  TASC - Crime report visualization for investigative analysis: A case study , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[22]  Paul Dourish,et al.  Unifying artifacts and activities in a visual tool for distributed software development teams , 2004, Proceedings. 26th International Conference on Software Engineering.

[23]  Dunja Mladenic,et al.  Document Visualization Based on Semantic Graphs , 2009, 2009 13th International Conference Information Visualisation.

[24]  Matthew O. Ward,et al.  Interactive Data Visualization - Foundations, Techniques, and Applications , 2010 .

[25]  Pak Chung Wong,et al.  Discovering Knowledge Through Visual Analysis , 2001, J. Univers. Comput. Sci..

[26]  Daniela Oelke,et al.  Real-Time Visualization of Streaming Text Data: Tasks and Challenges , 2011 .

[27]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[28]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[29]  Stephen G. Eick,et al.  Seesoft-A Tool For Visualizing Line Oriented Software Statistics , 1992, IEEE Trans. Software Eng..

[30]  Daniel M. German,et al.  On the use of visualization to support awareness of human activities in software development: a survey and a framework , 2005, SoftVis '05.

[31]  Martin Wattenberg,et al.  Arc diagrams: visualizing structure in strings , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[32]  Chris North,et al.  Workshop report: information visualization-human-centered issues in visual representation, interaction, and evalution , 2007 .

[33]  Jimeng Sun,et al.  ContexTour: Contextual Contour Analysis on Dynamic Multi-relational Clustering , 2010, SDM.

[34]  Xiaohua Sun,et al.  Whisper: Tracing the Spatiotemporal Process of Information Diffusion in Real Time , 2012, IEEE Transactions on Visualization and Computer Graphics.

[35]  Pavol Fabo,et al.  Three-level Visualization of Internet Discussion with Extruded Word Clouds , 2012, 2012 16th International Conference on Information Visualisation.

[36]  Pak Chung Wong,et al.  TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[37]  Ganesh S. Oak Information Visualization Introduction , 2022 .

[38]  Furu Wei,et al.  Context preserving dynamic word cloud visualization , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[39]  Michael W. Godfrey,et al.  An integrated approach for studying architectural evolution , 2002, Proceedings 10th International Workshop on Program Comprehension.

[40]  Vyvyan Evans,et al.  Semantic structure vs . conceptual structure : The nature of lexical concepts in a simulation-based account of language understanding , 2009 .

[41]  X. Lin,et al.  Visualization for the document space , 1992, Proceedings Visualization '92.

[42]  W. Bradford Paley,et al.  TextArc: Showing Word Frequency and Distribution in Text , 2002 .

[43]  Shimei Pan,et al.  TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis , 2012, TIST.

[44]  Lucy T. Nowell,et al.  ThemeRiver: visualizing theme changes over time , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[45]  Anselm Spoerri RankSpiral: Toward Enhancing Search Results Visualizations , 2004 .

[46]  M. Buckland What is a “document”? , 1997 .

[47]  Marko Grobelnik,et al.  KDD-2004 workshop report link analysis and group detection (LinkKDD-2004) , 2004, SKDD.

[48]  Daniel A. Keim,et al.  Document Cards: A Top Trumps Visualization for Documents , 2009, IEEE Transactions on Visualization and Computer Graphics.

[49]  Yifan Hu,et al.  Interactive Visualization of Streaming Text Data with Dynamic Maps , 2013, J. Graph Algorithms Appl..

[50]  Audris Mockus,et al.  Visualizing Software Changes , 2002, IEEE Trans. Software Eng..

[51]  Martin Wattenberg,et al.  Participatory Visualization with Wordle , 2009, IEEE Transactions on Visualization and Computer Graphics.

[52]  Kenneth A. Perrine,et al.  Interactive visualization of multiple query results , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[53]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[54]  Richard C. Holt,et al.  Exploring software evolution using spectrographs , 2004, 11th Working Conference on Reverse Engineering.

[55]  Wei Chen,et al.  Sequential document visualization based on hierarchical parametric histogram curves , 2012 .

[56]  Martin Wattenberg,et al.  TIMELINESTag clouds and the case for vernacular visualization , 2008, INTR.

[57]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[58]  Jimeng Sun,et al.  FacetAtlas: Multifaceted Visualization for Rich Text Corpora , 2010, IEEE Transactions on Visualization and Computer Graphics.

[59]  Padhraic Smyth,et al.  TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling , 2012, TIST.

[60]  Jimeng Sun,et al.  ContexTour: Contextual Contour Visual Analysis on Dynamic Multi- Relational Clustering , 2010 .

[61]  John T. Stasko,et al.  Jigsaw: Supporting Investigative Analysis through Interactive Visualization , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[62]  Yifan Hu,et al.  Visualizing Streaming Text Data with Dynamic Graphs and Maps , 2012, GD.

[63]  Susan T. Dumais,et al.  PivotPaths: Strolling through Faceted Information Spaces , 2012, IEEE Transactions on Visualization and Computer Graphics.