Principal component analysis of compositional data

SUMMARY Compositional data, consisting of vectors of proportions, have proved difficult to handle statistically because of the awkward constraint that the components of each vector must sum to unity. Moreover such data sets frequently display marked curvature so that linear techniques such as standard principal component analysis are likely to prove inadequate. From a critical reexamination of previous approaches we evolve, through adaptation of recently introduced transformation techniques for compositional data analysis, a log linear contrast form of principal component analysis and illustrate its advantages in applications.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  F. Chayes On correlation between variables of constant sum , 1960 .

[4]  O. V. Sarmanov,et al.  On the Correlation between Percentage Values: Major Component Correlation in Ferromagnesium Micas , 1961, The Journal of Geology.

[5]  J. Mosimann On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions , 1962 .

[6]  F. Chayes,et al.  Numerical Correlation and Petrographic Variation , 1962, The Journal of Geology.

[7]  W. Webb,et al.  The Use of Principal Component Analysis to Screen Mineralogical Data , 1966, The Journal of Geology.

[8]  Felix Chayes,et al.  An Approximate Statistical Test for Correlations between Proportions , 1966, The Journal of Geology.

[9]  J. Gower Multivariate Analysis and Multidimensional Geometry , 1967 .

[10]  R. Maitre Chemical Variation within and between Volcanic Rock Series—A Statistical Approach , 1968 .

[11]  A. T. Miesch The Constant Sum Problem in Geochemistry , 1969 .

[12]  John C. Butler,et al.  Principal components analysis using the hypothetical closed array , 1976 .

[13]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[14]  D. Ratcliff,et al.  No-association of proportions , 1978 .

[15]  John C. Butler,et al.  Trends in ternary petrologic variation diagrams; fact or fantasy? , 1979 .

[16]  J. Atchison,et al.  Logistic-normal distributions:Some properties and uses , 1980 .

[17]  J. Aitchison A new approach to null correlations of proportions , 1981 .

[18]  M. C. Jones,et al.  The Statistical Analysis of Compositional Data , 1986 .

[19]  J. Aitchison Reducing the dimensionality of compositional data sets , 1984 .