Visualizing balances of compositional data: A new alternative to balance dendrograms

Balances have become a cornerstone of compositional data analysis. However, conceptualizing balances is difficult, especially for high-dimensional data. Most often, investigators visualize balances with the balance dendrogram, but this technique is not necessarily intuitive and does not scale well for large data. This manuscript introduces the 'balance' package for the R programming language. This package visualizes balances of compositional data using an alternative to the balance dendrogram. This alternative contains the same information coded by the balance dendrogram, but projects data on a common scale that facilitates direct comparisons and accommodates high-dimensional data. By stripping the branches from the tree, 'balance' can cleanly visualize any subset of balances without disrupting the interpretation of the remaining balances. As an example, this package is applied to a publicly available meta-genomics data set measuring the relative abundance of 500 microbe taxa.

[1]  Peter Filzmoser,et al.  robCompositions: An R‐package for Robust Statistical Analysis of Compositional Data , 2011 .

[2]  Javier Palarea-Albaladejo,et al.  zCompositions — R package for multivariate imputation of left-censored data under a compositional approach , 2015 .

[3]  Jose A Navas-Molina,et al.  Balance Trees Reveal Microbial Niche Differentiation , 2017, mSystems.

[4]  Jürg Bähler,et al.  Proportionality: A Valid Alternative to Correlation for Relative Data , 2014, bioRxiv.

[5]  K. Gerald van den Boogaart,et al.  Descriptive Analysis of Compositional Data , 2013 .

[6]  Thomas P. Quinn,et al.  Differential proportionality –a normalization-free approach to differential gene expression , 2017, bioRxiv.

[7]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[8]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[9]  Representation of Species Composition , 2015 .

[10]  Vera Pawlowsky-Glahn,et al.  Balance-dendrogram. A new routine of CoDaPack , 2008, Comput. Geosci..

[11]  Jean M. Macklaim,et al.  ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq , 2013, PloS one.

[12]  David R. Lovell,et al.  propr: An R-package for Identifying Proportionally Abundant Features Using Compositional Data Analysis , 2017, Scientific Reports.

[13]  Carles Barceló-Vidal Mathematical Foundations of Compositional Data Analysis , 2001 .

[14]  Hans Bisgaard,et al.  Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies , 2016, Microbiome.

[15]  C. Barceló-Vidal,et al.  The mathematics of compositional analysis , 2016 .

[16]  Christian L. Müller,et al.  Sparse and Compositionally Robust Inference of Microbial Ecological Networks , 2014, PLoS Comput. Biol..

[17]  Jean M. Macklaim,et al.  Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis , 2014, Microbiome.

[18]  V. Pawlowsky-Glahn,et al.  Groups of Parts and Their Balances in Compositional Data Analysis , 2005 .

[19]  Gregory B. Gloor,et al.  Linear association in compositional data analysis , 2018 .

[20]  Lawrence A. David,et al.  A phylogenetic transform enhances analysis of compositional microbiota data , 2016, bioRxiv.

[21]  K. Gerald van den Boogaart,et al.  Fundamental Concepts of Compositional Data Analysis , 2013 .

[22]  A. Heintz‐Buschart,et al.  Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes , 2016, Nature Microbiology.

[23]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[24]  Thomas P. Quinn,et al.  Understanding sequencing data as compositions: an outlook and review , 2017, bioRxiv.

[25]  Thomas P. Quinn,et al.  Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods , 2018, BMC Bioinformatics.

[26]  V. Pawlowsky-Glahn,et al.  Advances in Principal Balances for Compositional Data , 2018, Mathematical Geosciences.

[27]  R. Paredes,et al.  Balances: a New Perspective for Microbiome Analysis , 2017, mSystems.

[28]  Rob Knight,et al.  Analysis of composition of microbiomes: a novel method for studying microbial composition , 2015, Microbial ecology in health and disease.

[29]  V. Pawlowsky-Glahn,et al.  Geometric approach to statistical analysis on the simplex , 2001 .

[30]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[31]  Raimon Tolosana-Delgado,et al.  "compositions": A unified R package to analyze compositional data , 2008, Comput. Geosci..