Towards a pragmatic approach to compositional data analysis

Compositional data are nonnegative data with the property of closure: that is, each set of values on their components, or so-called parts, has a fixed sum, usually 1 or 100%. The approach to compositional data analysis originated by John Aitchison uses ratios of parts as the fundamental starting point for description and modeling. I show that a compositional data set can be effectively replaced by a set of ratios, one less than the number of parts, and that these ratios describe an acyclic connected graph of all the parts. Contrary to recent literature, I show that the additive log-ratio transformation can be an excellent substitute for the original data set, as shown in an archaeological data set as well as in three other examples. I propose further that a smaller set of ratios of parts can be determined, either by expert choice or by automatic selection, which explains as much variance as required for all practical purposes. These part ratios can then be validly summarized and analyzed by conventional univariate methods, as well as multivariate methods, where the ratios are preferably log-transformed.

[1]  J. Aitchison,et al.  Logratio Analysis and Compositional Distance , 2000 .

[2]  T. Rehren,et al.  Interactions between silicate and salt melts in LBA glassmaking , 2008 .

[3]  K. Gerald van den Boogaart,et al.  Analyzing Compositional Data with R , 2013 .

[4]  John Aitchison,et al.  Principles of compositional data analysis , 1994 .

[5]  R. Gittins,et al.  Canonical Analysis: A Review with Applications in Ecology , 1985 .

[6]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[7]  M. Greenacre Measuring Subcompositional Incoherence , 2011 .

[8]  Frank Harary,et al.  Graphical enumeration , 1973 .

[9]  Lewi Pj,et al.  Spectral mapping, a technique for classifying biological activity profiles of chemical compounds. , 1976 .

[10]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[11]  Geert Molenberghs,et al.  Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods , 2003, Biometrics.

[12]  John Aitchison,et al.  Relative variation diagrams for describing patterns of compositional variability , 1990 .

[13]  Michael Greenacre,et al.  Contribution Biplots , 2013 .

[14]  A. L. V. D. Wollenberg Redundancy analysis an alternative for canonical correlation analysis , 1977 .

[15]  J. Aitchison Principal component analysis of compositional data , 1983 .

[16]  M. Greenacre,et al.  Arctic pelagic amphipods: lipid dynamics and life strategy , 2015 .

[17]  P. Legendre,et al.  vegan : Community Ecology Package. R package version 1.8-5 , 2007 .

[18]  Joaquín A. Cortés,et al.  On the Harker Variation Diagrams; A Comment on “The Statistical Analysis of Compositional Data. Where Are We and Where Should We Be Heading?” by Aitchison and Egozcue (2005) , 2009 .

[19]  Mike Baxter,et al.  Principal component and correspondence analysis of compositional data: some similarities , 1990 .

[20]  Calyampudi R. Rao The use and interpretation of principal component analysis in applied research , 1964 .

[21]  P. Filzmoser,et al.  Univariate Statistical Analysis of Environmental (compositional) Data: Problems and Possibilities , 2009 .

[22]  M. Bóna A Walk Through Combinatorics: An Introduction to Enumeration and Graph Theory , 2006 .