EMPress Enables Tree-Guided, Interactive, and Exploratory Analyses of Multi-omic Data Sets

Phylogenetic trees are integral data structures for the analysis of microbial communities. Recent work has also shown the utility of trees constructed from certain metabolomic data sets, further highlighting their importance in microbiome research. ABSTRACT Standard workflows for analyzing microbiomes often include the creation and curation of phylogenetic trees. Here we present EMPress, an interactive web tool for visualizing trees in the context of microbiome, metabolome, and other community data scalable to trees with well over 500,000 nodes. EMPress provides novel functionality—including ordination integration and animations—alongside many standard tree visualization features and thus simplifies exploratory analyses of many forms of ‘omic data. IMPORTANCE Phylogenetic trees are integral data structures for the analysis of microbial communities. Recent work has also shown the utility of trees constructed from certain metabolomic data sets, further highlighting their importance in microbiome research. The ever-growing scale of modern microbiome surveys has led to numerous challenges in visualizing these data. In this paper we used five diverse data sets to showcase the versatility and scalability of EMPress, an interactive web visualization tool. EMPress addresses the growing need for exploratory analysis tools that can accommodate large, complex multi-omic data sets.

[1]  M. Kanehisa Enzyme Annotation and Metabolic Reconstruction Using KEGG. , 2017, Methods in molecular biology.

[2]  E. Stackebrandt,et al.  Nucleic acid techniques in bacterial systematics , 1991 .

[3]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[4]  Stephan Hoyer,et al.  mwaskom/seaborn: v0.10.0 (January 2020) , 2020 .

[5]  Rick L. Stevens,et al.  A communal catalogue reveals Earth’s multiscale microbial diversity , 2017, Nature.

[6]  John R. Stevens,et al.  SigTree: A Microbial Community Analysis Tool to Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree , 2017, Computational and structural biotechnology journal.

[7]  Lawrence A. David,et al.  A phylogenetic transform enhances analysis of compositional microbiota data , 2016, bioRxiv.

[8]  D. Lane 16S/23S rRNA sequencing , 1991 .

[9]  S. K. Sarin,et al.  Multi-Omics integration analysis of respiratory specimen characterizes baseline molecular determinants associated with COVID-19 diagnosis. , 2020, medRxiv.

[10]  Justin J. J. van der Hooft,et al.  Chemically informed analyses of metabolomics mass spectrometry data with Qemistree , 2020, Nature Chemical Biology.

[11]  Ritesh Krishna,et al.  Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes , 2020, iScience.

[12]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[13]  Tom O. Delmont,et al.  Anvi’o: an advanced analysis and visualization platform for ‘omics data , 2015, PeerJ.

[14]  Andreas Wilke,et al.  The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome , 2012, GigaScience.

[15]  Guangchuang Yu,et al.  Using ggtree to Visualize Data on Tree‐Like Structures , 2020, Current protocols in bioinformatics.

[16]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[17]  Stefan Elbe,et al.  Data, disease and diplomacy: GISAID's innovative contribution to global health , 2017, Global challenges.

[18]  Lawrence A. David,et al.  Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets , 2017, PeerJ.

[19]  Alexandre P. Francisco,et al.  PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods , 2017, Bioinform..

[20]  Jose A Navas-Molina,et al.  Bringing the Dynamic Microbiome to Life with Animations. , 2017, Cell host & microbe.

[21]  Rob Knight,et al.  Analysis of composition of microbiomes: a novel method for studying microbial composition , 2015, Microbial ecology in health and disease.

[22]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[23]  James T. Morton,et al.  Establishing microbial composition measurement standards with reference frames , 2019, Nature Communications.

[24]  R. Knight,et al.  PyCogent: a toolkit for making sense from sequence , 2007, Genome Biology.

[25]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[26]  John R. Hollenbeck,et al.  Harking, Sharking, and Tharking , 2017 .

[27]  Rob Knight,et al.  TopiaryExplorer: visualizing large phylogenetic trees with environmental metadata , 2011, Bioinform..

[28]  Jose A Navas-Molina,et al.  Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns , 2017, mSystems.

[29]  Functional profiling of COVID-19 respiratory tract microbiomes , 2021, Scientific reports.

[30]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[31]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[32]  Allan R. Wilks,et al.  Dynamic Graphics for Data Analysis , 1987 .

[33]  Benjamin D. Kaehler,et al.  Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin , 2018, Microbiome.

[34]  Gonzalo Navarro,et al.  Simple and efficient fully-functional succinct trees , 2016, Theor. Comput. Sci..

[35]  MingKun Li,et al.  Genomic diversity of SARS-CoV-2 in Coronavirus Disease 2019 patients , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[39]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[40]  Rob Knight,et al.  Striped UniFrac: enabling microbiome analysis at unprecedented scale , 2018, Nature Methods.

[41]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[42]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[43]  Jean M. Macklaim,et al.  Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis , 2014, Microbiome.

[44]  Eser Kandogan,et al.  Functional Genomics Platform, A Cloud-Based Platform for Studying Microbial Life at Scale , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Rob Knight,et al.  EMPeror: a tool for visualizing high-throughput microbial community data , 2013, GigaScience.

[46]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[47]  Patrick D Schloss,et al.  Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research , 2018, mBio.

[48]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[49]  Jose A Navas-Molina,et al.  Balance Trees Reveal Microbial Niche Differentiation , 2017, mSystems.

[50]  Laxmi Parida,et al.  Comparative exomics of Phalaris cultivars under salt stress , 2014, BMC Genomics.