A systematic method for surveying data visualizations and a resulting genomic epidemiology visualization typology: GEViT

Abstract Motivation Data visualization is an important tool for exploring and communicating findings from genomic and healthcare datasets. Yet, without a systematic way of organizing and describing the design space of data visualizations, researchers may not be aware of the breadth of possible visualization design choices or how to distinguish between good and bad options. Results We have developed a method that systematically surveys data visualizations using the analysis of both text and images. Our method supports the construction of a visualization design space that is explorable along two axes: why the visualization was created and how it was constructed. We applied our method to a corpus of scientific research articles from infectious disease genomic epidemiology and derived a Genomic Epidemiology Visualization Typology (GEViT) that describes how visualizations were created from a series of chart types, combinations and enhancements. We have also implemented an online gallery that allows others to explore our resulting design space of visualizations. Our results have important implications for visualization design and for researchers intending to develop or use data visualization tools. Finally, the method that we introduce is extensible to constructing visualizations design spaces across other research areas. Availability and implementation Our browsable gallery is available at http://gevit.net and all project code can be found at https://github.com/amcrisan/gevitAnalysisRelease. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Chris North,et al.  Information Visualization , 2008, Lecture Notes in Computer Science.

[2]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[3]  Julie A. Jacko,et al.  Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications, Third Edition , 2012 .

[4]  M. Sheelagh T. Carpendale,et al.  Evaluating Information Visualizations , 2008, Information Visualization.

[5]  Jeffrey Heer,et al.  Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco , 2018, IEEE Transactions on Visualization and Computer Graphics.

[6]  Andreas Kerren,et al.  BioVis Explorer: A visual guide for biological data visualization techniques , 2017, PloS one.

[7]  Anna Straton,et al.  Water Ecosystem Services in Northern Australia—How Much Are They Worth and Who Should Pay for Their Provision? , 2013, PloS one.

[8]  Jacques Bertin,et al.  Semiology of Graphics - Diagrams, Networks, Maps , 2010 .

[9]  K. Charmaz,et al.  Constructing Grounded Theory , 2014 .

[10]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[11]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[12]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[13]  Oliver G. Pybus,et al.  Mobile real-time surveillance of Zika virus in Brazil , 2016, Genome Medicine.

[14]  Sherry Koshman,et al.  Information Visualization: Human-Centered Issues and Perspectives , 2009, J. Assoc. Inf. Sci. Technol..

[15]  Khalil Abudahab,et al.  Microreact: visualizing and sharing data for genomic epidemiology and phylogeography , 2016, Microbial genomics.

[16]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[17]  Thomas R Rogers,et al.  Rapid, comprehensive, and affordable mycobacterial diagnosis with whole-genome sequencing: a prospective study , 2016, The Lancet. Respiratory medicine.

[18]  Hans-Jörg Schulz,et al.  Treevis.net: A Tree Visualization Reference , 2011, IEEE Computer Graphics and Applications.

[19]  Tamara Munzner,et al.  Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory , 2017, bioRxiv.

[20]  Isabel Meirelles,et al.  Design for Information: An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations , 2013 .

[21]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[22]  Hadley Wickham,et al.  A Cognitive Interpretation of Data Analysis , 2014 .

[23]  Hadley Wickham,et al.  A Layered Grammar of Graphics , 2010 .

[24]  Emma Griffiths,et al.  Context Is Everything: Harmonization of Critical Food Microbiology Descriptors and Metadata for Improved Food Safety and Surveillance , 2017, Front. Microbiol..

[25]  Leland Wilkinson,et al.  The Grammar of Graphics (Statistics and Computing) , 2005 .

[26]  Mark Bailey,et al.  The Grammar of Graphics , 2007, Technometrics.

[27]  Donovan H. Parks,et al.  GenGIS 2: Geospatial Analysis of Traditional and Genetic Biodiversity, with New Gradient Algorithms and an Extensible Plugin Framework , 2013, PloS one.

[28]  Tamara Munzner,et al.  Adjutant: an R‐based tool to support topic discovery for systematic and literature reviews , 2019, Bioinform..

[29]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[30]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[31]  Silvia Miksch,et al.  The State‐of‐the‐Art of Set Visualization , 2016, Comput. Graph. Forum.