An method for systematically surveying data visualizations in infectious disease genomic epidemiology

Data visualization is an important tool for exploring and communicating findings from genomic and health datasets. Yet, without a systematic way of understanding the design space of data visualizations, researchers do not have a clear sense of what kind of visualizations are possible, or how to distinguish between good and bad options. We have devised an approach using both literature mining and human-in-the-loop analysis to construct a visualization design space from corpus of scientific research papers. We ascertain why and what visualizations were created, and how they are constructed. We applied our approach to derive a Genomic Epidemiology Visualization Typology (GEViT) and operationalized our results to produce an explorable gallery of the visualization design space containing hundreds of categorized visualizations. We are the first to take such a systematic approach to visualization analysis, which can be applied by future visualization tool developers to areas that extend beyond genomic epidemiology.

[1]  Julie A. Jacko,et al.  Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications, Third Edition , 2012 .

[2]  Silvia Miksch,et al.  Visualizing Sets and Set-typed Data: State-of-the-Art and Future Challenges , 2014, EuroVis.

[3]  Ann Blandford,et al.  Confessions from a grounded theory PhD: experiences and lessons learnt , 2011, CHI.

[4]  Tamara Munzner,et al.  A Nested Model for Visualization Design and Validation , 2009, IEEE Transactions on Visualization and Computer Graphics.

[5]  M. Sheelagh T. Carpendale,et al.  Evaluating Information Visualizations , 2008, Information Visualization.

[6]  Andreas Kerren,et al.  BioVis Explorer: A visual guide for biological data visualization techniques , 2017, PloS one.

[7]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[8]  Jessica S Ancker,et al.  Rethinking health numeracy: a multidisciplinary literature review. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[9]  Khalil Abudahab,et al.  Microreact: visualizing and sharing data for genomic epidemiology and phylogeography , 2016, Microbial genomics.

[10]  Tamara Munzner,et al.  Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices , 2013, IEEE Transactions on Visualization and Computer Graphics.

[11]  Jacques Bertin,et al.  Semiology of Graphics - Diagrams, Networks, Maps , 2010 .

[12]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[13]  Emma Griffiths,et al.  Context Is Everything: Harmonization of Critical Food Microbiology Descriptors and Metadata for Improved Food Safety and Surveillance , 2017, Front. Microbiol..

[14]  Donovan H. Parks,et al.  GenGIS 2: Geospatial Analysis of Traditional and Genetic Biodiversity, with New Gradient Algorithms and an Extensible Plugin Framework , 2013, PloS one.

[15]  Hans-Jörg Schulz,et al.  Treevis.net: A Tree Visualization Reference , 2011, IEEE Computer Graphics and Applications.

[16]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[17]  Linda Liebenberg,et al.  Analysing image-based data using grounded theory: the Negotiating Resilience Project , 2012 .

[18]  E. Brink,et al.  Constructing grounded theory : A practical guide through qualitative analysis , 2006 .

[19]  Thomas R Rogers,et al.  Rapid, comprehensive, and affordable mycobacterial diagnosis with whole-genome sequencing: a prospective study , 2016, The Lancet. Respiratory medicine.

[20]  N. Keating,et al.  Evidence-Based Risk Communication A Systematic Review , 2014 .

[21]  Tamara Munzner,et al.  Adjutant: an R-based tool to support topic discovery for systematic and literature reviews , 2018, bioRxiv.

[22]  Tamara Munzner,et al.  Visualization Analysis and Design , 2014, A.K. Peters visualization series.

[23]  Ioannis Xenarios,et al.  SourceData: a semantic platform for curating and searching figures , 2016, Nature Methods.

[24]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[25]  Shion Guha,et al.  Machine Learning and Grounded Theory Method: Convergence, Divergence, and Combination , 2016, GROUP.

[26]  Landon Fridman Detwiler,et al.  Visualization and analytics tools for infectious disease epidemiology: A systematic review , 2014, J. Biomed. Informatics.

[27]  Isabel Meirelles,et al.  Design for Information: An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations , 2013 .

[28]  Joseph A. Maxwell,et al.  Qualitative Research Design: An Interactive Approach , 1996 .

[29]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[30]  Hadley Wickham,et al.  A Cognitive Interpretation of Data Analysis , 2014 .

[31]  Oliver G. Pybus,et al.  Mobile real-time surveillance of Zika virus in Brazil , 2016, Genome Medicine.

[32]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[33]  John W. Creswell,et al.  Designing and Conducting Mixed Methods Research , 2006 .

[34]  Tamara Munzner,et al.  Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory , 2017, bioRxiv.

[35]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[36]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[37]  Leland Wilkinson The Grammar of Graphics , 1999 .

[38]  Hadley Wickham,et al.  A Layered Grammar of Graphics , 2010 .