Visualizing ’omic feature rankings and log-ratios using Qurro

Abstract Many tools for dealing with compositional ‘ ’omics’ data produce feature-wise values that can be ranked in order to describe features’ associations with some sort of variation. These values include differentials (which describe features’ associations with specified covariates) and feature loadings (which describe features’ associations with variation along a given axis in a biplot). Although prior work has discussed the use of these ‘rankings’ as a starting point for exploring the log-ratios of particularly high- or low-ranked features, such exploratory analyses have previously been done using custom code to visualize feature rankings and the log-ratios of interest. This approach is laborious, prone to errors and raises questions about reproducibility. To address these problems we introduce Qurro, a tool that interactively visualizes a plot of feature rankings (a ‘rank plot’) alongside a plot of selected features’ log-ratios within samples (a ‘sample plot’). Qurro’s interface includes various controls that allow users to select features from along the rank plot to compute a log-ratio; this action updates both the rank plot (through highlighting selected features) and the sample plot (through displaying the current log-ratios of samples). Here, we demonstrate how this unique interface helps users explore feature rankings and log-ratios simply and effectively.

[1]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[2]  M. Besson,et al.  The Gills of Reef Fish Support a Distinct Microbiome Influenced by Host-Specific Factors , 2018, Applied and Environmental Microbiology.

[3]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[4]  I. Paulsen,et al.  Ecological Genomics of Marine Picocyanobacteria , 2009, Microbiology and Molecular Biology Reviews.

[5]  R. Parsons,et al.  Minor revision to V4 region SSU rRNA 806R gene primer greatly increases detection of SAR11 bacterioplankton , 2015 .

[6]  J. Fuhrman,et al.  Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. , 2016, Environmental microbiology.

[7]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[8]  Richard A. Becker,et al.  Brushing scatterplots , 1987 .

[9]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[10]  R. Knight,et al.  Temporal, Environmental, and Biological Drivers of the Mucosal Microbiome in a Wild Marine Fish, Scomber japonicus , 2019, mSphere.

[11]  T. Quinn,et al.  Amalgams: data-driven amalgamation for the reference-free dimensionality reduction of zero-laden compositional data , 2020, bioRxiv.

[12]  Francis Tuerlinckx,et al.  Increasing Transparency Through a Multiverse Analysis , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[13]  Arvind Satyanarayan,et al.  Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization , 2016, IEEE Transactions on Visualization and Computer Graphics.

[14]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[15]  Jose A Navas-Molina,et al.  Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns , 2017, mSystems.

[16]  James T. Morton,et al.  Establishing microbial composition measurement standards with reference frames , 2019, Nature Communications.

[17]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[18]  Andreas Wilke,et al.  The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome , 2012, GigaScience.

[19]  Karsten Zengler,et al.  A Novel Sparse Compositional Technique Reveals Microbial Perturbations , 2019, mSystems.

[20]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[21]  Jean M. Macklaim,et al.  Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis , 2014, Microbiome.

[22]  Benjamin D. Kaehler,et al.  Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin , 2018, Microbiome.

[23]  R. Paredes,et al.  Balances: a New Perspective for Microbiome Analysis , 2017, mSystems.

[24]  Ryan Hendrickson,et al.  KatharoSeq Enables High-Throughput Microbiome Analysis from Low-Biomass Samples , 2018, mSystems.

[25]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[26]  Arvind Satyanarayan,et al.  Altair: Interactive Statistical Visualizations for Python , 2018, J. Open Source Softw..

[27]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[30]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.