Compositional Biplots: A Story of False Leads and Hidden Features Revealed by the Last Dimensions

Logratio principal component analysis is often one of the first steps in exploring a compositional data set. Compositional biplots based on the first two principal components are frequently used to uncover proportionality between parts or to detect one-dimensional patterns of variability for larger subcompositions. This article argues that this approach is likely to produce false leads and proposes an alternative procedure based on condition indices and low-variance principal components. We advocate the calculation of condition indices, combined with biplots of the last few principal components and lists of subcompositions with large condition numbers, and these are shown to be useful for detecting proportionality and one-dimensional relationships. The detection of such patterns in compositional data sets is shown to be closely related to the analysis of multicollinearity as employed in linear regression. Two example data sets, amino acid compositions in calves and chemical components of coffee aroma, are used as illustrations.

[1]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[2]  J. Aitchison Principal component analysis of compositional data , 1983 .

[3]  P. Filzmoser,et al.  Applied Compositional Data Analysis: With Worked Examples in R , 2018 .

[4]  Peter Filzmoser,et al.  robCompositions: An R‐package for Robust Statistical Analysis of Compositional Data , 2011 .

[5]  Michael Friendly,et al.  Where's Waldo? Visualizing Collinearity Diagnostics , 2009 .

[6]  Jan Graffelman,et al.  Calibration of Multivariate Scatter plots for Exploratory Analysis of Relations Within and Between Sets of Variables in Genomic Research , 2005, Biometrical journal. Biometrische Zeitschrift.

[7]  J. Aitchison Logratios and Natural Laws in Compositional Data Analysis , 1999 .

[8]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[9]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[10]  V. Pawlowsky-Glahn,et al.  Modelling and Analysis of Compositional Data: Pawlowsky-Glahn/Modelling and Analysis of Compositional Data , 2015 .

[11]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[12]  L. Müller,et al.  Coffee aroma--statistical analysis of compositional data. , 2009, Talanta.

[13]  Gregory B. Gloor,et al.  Linear association in compositional data analysis , 2018 .

[14]  Jan Graffelman,et al.  Exploration of geochemical data with compositional canonical biplots. , 2018, Journal of geochemical exploration.

[15]  A. Edelman Eigenvalues and condition numbers of random matrices , 1988 .