Linear association in compositional data analysis

With compositional data ordinary covariation indexes, designed for real random variables, fail to describe dependence. There is a need for compositional alternatives to covariance and correlation. Based on the Euclidean structure of the simplex, called Aitchison geometry, compositional association is identied to a linear restriction of the sample space when a log-contrast is constant. In order to simplify interpretation, a sparse and simple version of compositional association is dened in terms of balances which are constant across the sample. It is called b-association. This kind of association of compositional variables is extended to association between groups of compositional variables. In practice, exact b-association seldom occurs, and measures of degree of b-association are reviewed based on those previously proposed. Also, some techniques for testing b-association are studied. These techniques are applied to available oral microbiome data to illustrate both their advantages and diculties. Both testing and measurements of b-association appear to be quite sensible to heterogeneities in the studied populations and to outliers.

[1]  F. Chayes On correlation between variables of constant sum , 1960 .

[2]  V. Pawlowsky-Glahn,et al.  Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation , 2003 .

[3]  Sophie J. Weiss,et al.  Correlation detection strategies in microbial data sets vary widely in sensitivity and precision , 2016, The ISME Journal.

[4]  J. Mosimann On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions , 1962 .

[5]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[6]  P. Filzmoser,et al.  Bayesian-multiplicative treatment of count zeros in compositional data sets , 2015 .

[7]  R A FISHER,et al.  The analysis of covariance method for the relation between a part and the whole. , 1947, Biometrics.

[8]  V. Pawlowsky-Glahn,et al.  Compositional data analysis : theory and applications , 2011 .

[9]  Vera Pawlowsky-Glahn,et al.  Basic Concepts and Procedures , 2011 .

[10]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[11]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[12]  S. Stigler Francis Galton's Account of the Invention of Correlation , 1989 .

[13]  Mihai Pop,et al.  Bioinformatics for the Human Microbiome Project , 2012, PLoS Comput. Biol..

[14]  Sayan Mukherjee,et al.  A phylogenetic transform enhances analysis of compositional microbiota data , 2016 .

[15]  P. Guttorp,et al.  Statistical Interpretation of Species Composition , 2001 .

[16]  J. Egozcue Reply to “On the Harker Variation Diagrams; …” by J.A. Cortés , 2009 .

[17]  M. Westoby,et al.  Bivariate line‐fitting methods for allometry , 2006, Biological reviews of the Cambridge Philosophical Society.

[18]  V. Pawlowsky-Glahn,et al.  New Perspectives on Water Chemistry and Compositional Data Analysis , 2005 .

[19]  C. Barceló-Vidal,et al.  The mathematics of compositional analysis , 2016 .

[20]  B. Weir,et al.  Testing for Hardy–Weinberg equilibrium at biallelic genetic markers on the X chromosome , 2016, Heredity.

[21]  Ian T. Jolliffe,et al.  A clustering approach to interpretable principal components , 2013 .

[22]  V. Pawlowsky-Glahn,et al.  Geometric approach to statistical analysis on the simplex , 2001 .

[23]  Qiang Liu,et al.  A hyperspherical transformation forecasting model for compositional data , 2007, Eur. J. Oper. Res..

[24]  G. Yule NOTES ON THE THEORY OF ASSOCIATION OF ATTRIBUTES IN STATISTICS , 1903 .

[25]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[26]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[27]  Karl Pearson,et al.  On Theories of Association , 1913 .

[28]  J Aitchison,et al.  The one-hour course in compositional data analysis or compositional data analysis is simple , 1997 .

[29]  V. Pawlowsky-Glahn,et al.  Exploring Compositional Data with the CoDa-Dendrogram , 2011 .

[30]  V. Pawlowsky-Glahn,et al.  Advances in Principal Balances for Compositional Data , 2018, Mathematical Geosciences.

[31]  H. Chipman,et al.  Interpretable dimension reduction , 2005 .

[32]  F. Chayes,et al.  Numerical Correlation and Petrographic Variation , 1962, The Journal of Geology.

[33]  K. Pearson Mathematical contributions to the theory of evolution.—On a form of spurious correlation which may arise when indices are used in the measurement of organs , 1897, Proceedings of the Royal Society of London.

[34]  V. Pawlowsky-Glahn,et al.  Simplicial geometry for compositional data , 2006, Geological Society, London, Special Publications.

[35]  Marco Scarsini,et al.  On measures of concordance , 1984 .

[36]  James T. Morton,et al.  Microbiome-wide association studies link dynamic microbial consortia to disease , 2016, Nature.

[37]  G. Hardy MENDELIAN PROPORTIONS IN A MIXED POPULATION. , 1908 .

[38]  Jean M. Macklaim,et al.  Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis , 2014, Microbiome.

[39]  Robert J. Connor,et al.  Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution , 1969 .

[40]  J. Aitchison,et al.  Compositional Data Analysis: Where Are We and Where Should We Be Heading? , 2003 .

[41]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[42]  V. PAWLOWSKY-GLAHN,et al.  Principal balances , .

[43]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[44]  V. Pawlowsky-Glahn,et al.  Groups of Parts and Their Balances in Compositional Data Analysis , 2005 .

[45]  Matthew C. B. Tsilimigras,et al.  Compositional data analysis of the microbiome: fundamentals, tools, and challenges. , 2016, Annals of epidemiology.

[46]  V. Pawlowsky-Glahn,et al.  Modelling and Analysis of Compositional Data: Pawlowsky-Glahn/Modelling and Analysis of Compositional Data , 2015 .

[47]  J. Aitchison Principal component analysis of compositional data , 1983 .

[48]  Cédric Notredame,et al.  How should we measure proportionality on relative gene expression data? , 2016, Theory in Biosciences.

[49]  Vera Pawlowsky-Glahn,et al.  It's all relative: analyzing microbiome data as compositions. , 2016, Annals of epidemiology.

[50]  G. Mateu-Figueras,et al.  Elements of Simplicial Linear Algebra and Geometry , 2011 .

[51]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[52]  J. Raes,et al.  Microbial interactions: from networks to models , 2012, Nature Reviews Microbiology.

[53]  Jürg Bähler,et al.  Proportionality: A Valid Alternative to Correlation for Relative Data , 2014, bioRxiv.

[54]  Carles Barceló-Vidal Mathematical Foundations of Compositional Data Analysis , 2001 .

[55]  B. Schweizer,et al.  On Nonparametric Measures of Dependence for Random Variables , 1981 .

[56]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[57]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[58]  G. Mateu-Figueras,et al.  On the interpretation of differences between groups for compositional data , 2015 .

[59]  J. Aitchison,et al.  Logratio Analysis and Compositional Distance , 2000 .

[60]  K. Pearson Mathematical contributions to the theory of evolution.—On the law of reversion , 2022, Proceedings of the Royal Society of London.