Document Towers: A MATLAB software implementing a three-dimensional architectural paradigm for the visual exploration of digital documents and libraries

This article introduces the generic Document Towers paradigm, visualization, and software for visualizing the structure of paginated documents, based on the metaphor of documents-as-architecture. The Document Towers visualizations resemble three-dimensional building models and represent the physical boundaries of logical (e.g., titles, images), semantic (e.g., topics, named entities), graphical (e.g., typefaces, colors), and other types of information with spatial extent as a stack of rooms and floors. The software takes as input user-supplied JSON-formatted coordinates and labels of document entities, or extracts them itself from ALTO and InDesign IDML files. The Document Towers paradigm and visualization enable information systems to support information behaviors other than goal-oriented searches. Visualization encourages exploration by generating panoramic overviews and fostering serendipitous insights, while the use of metaphors assists with comprehension of the representations through the application of a familiar cognitive model. Document Towers visualizations also provide access to types of information other than textual content, specifically by means of their physical structure, which corresponds to the material, logical, semantic, and contextual aspects of documents. Visualization renders documents transparent, making the invisible visible and facilitating analysis at a glance and without the need for physical manipulation. Keyword searches and other language-based interactions with documents must be clearly expressed and will return only answers to questions asked; by contrast, visual observation is well suited to fuzzy goals and uncovering unexpected aspects of the data. © 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

[1]  Christian Jacquemin,et al.  A Survey of 3D Document Corpus Visualization , 2009 .

[2]  Elaine Toms,et al.  Researching Serendipity in Digital Information Environments , 2017, Researching Serendipity in Digital Information Environments.

[3]  Jean-Luc Bloechle,et al.  Reverse-Engineering of PDF Files , 2014 .

[4]  William J. Mitchell,et al.  Rethinking the book , 1999 .

[5]  Alex Humphreys,et al.  Reimagining the Digital Monograph: Design Thinking to Build New Tools for Researchers , 2018, J. Electron. Publ..

[6]  Daniel A. Keim,et al.  Enhancing document structure analysis using visual analytics , 2010, SAC '10.

[7]  K. Fisher,et al.  Theories of information behavior , 2005 .

[8]  Ganesh S. Oak Information Visualization Introduction , 2022 .

[9]  Westone,et al.  Home Page , 2004, 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA).

[10]  Vlad Atanasiu Expert Bytes - Computer Expertise in Forensic Documents: Players, Needs, Resources and Pitfalls , 2013 .

[11]  Harald C. Gall,et al.  EvoSpaces: Multi-dimensional Navigation Spaces for Software Evolution , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[12]  A C Bonacci,et al.  Living with complexity. , 1986, American journal of hospital pharmacy.

[13]  Peter G. Selfridge,et al.  Cospace , 1999, Intell..

[14]  Marcia J. Bates,et al.  What is browsing - really? A model drawing from behavioural science research , 2007, Inf. Res..

[15]  Jérôme Dupire,et al.  Interactions et métadonnées riches pour les bibliothèques numérisées , 2006, Document Numérique.

[16]  M. Wertheim The Pearly Gates of Cyberspace: A History of Space from Dante to the Internet , 1999 .

[17]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[18]  Phil Turner A Psychology of User Experience: Involvement, Affect and Aesthetics , 2017 .