The Statistical Analysis of Compositional Data

1 Compositional data: some challenging problems.- 1.1 Introduction.- 1.2 Geochemical compositions of rocks.- 1.3 Sediments at different depths.- 1.4 Ternary diagrams.- 1.5 Partial analyses and subcompositions.- 1.6 Supervisory behaviour.- 1.7 Household budget surveys.- 1.8 Steroid metabolite patterns in adults and children.- 1.9 Activity patterns of a statistician.- 1.10 Calibration of white-cell compositions.- 1.11 Fruit evaluation.- 1.12 Firework mixtures.- 1.13 Clam ecology.- 1.14 Bibliographic notes.- Problems.- 2 The simplex as sample space.- 2.1 Choice of sample space.- 2.2 Compositions and simplexes.- 2.3 Spaces, vectors, matrices.- 2.4 Bases and compositions.- 2.5 Subcompositions.- 2.6 Amalgamations.- 2.7 Partitions.- 2.8 Perturbations.- 2.9 Geometrical representations of compositional data.- 2.10 Bibliographic notes.- Problems.- 3 The special difficulties of compositional data analysis.- 3.1 Introduction.- 3.2 High dimensionality.- 3.3 Absence of an interpretable covariance structure.- 3.4 Difficulty of parametric modelling.- 3.5 The mixture variation difficulty.- 3.6 Bibliographic notes.- Problems.- 4 Covariance structure.- 4.1 Fundamentals.- 4.2 Specification of the covariance structure.- 4.3 The compositional variation array.- 4.4 Recovery of the compositional variation array from the crude mean vector and covariance matrix.- 4.5 Subcompositional analysis.- 4.6 Matrix specifications of covariance structures.- 4.7 Some important elementary matrices.- 4.8 Relationships between the matrix specifications.- 4.9 Estimated matrices for hongite compositions.- 4.10 Logratios and logcontrasts.- 4.11 Covariance structure of a basis.- 4.12 Commentary.- 4.13 Bibliographic notes.- Problems.- 5 Properties of matrix covariance specifications.- 5.1 Logratio notation.- 5.2 Logcontrast variances and covariances.- 5.3 Permutations.- 5.4 Properties of P and QP matrices.- 5.5 Permutation invariants involving ?.- 5.6 Covariance matrix inverses.- 5.7 Subcompositions.- 5.8 Equivalence of characteristics of ?, ?, ?.- 5.9 Logratio-uncorrelated compositions.- 5.10 Isotropic covariance structures.- 5.11 Bibliographic notes.- Problems.- 6 Logistic normal distributions on the simplex.- 6.1 Introduction.- 6.2 The additive logistic normal class.- 6.3 Density function.- 6.4 Moment properties.- 6.5 Composition of a lognormal basis.- 6.6 Class-preserving properties.- 6.7 Conditional subcompositional properties.- 6.8 Perturbation properties.- 6.9 A central limit theorem.- 6.10 A characterization by logcontrasts.- 6.11 Relationships with the Dirichlet class.- 6.12 Potential for statistical analysis.- 6.13 The multiplicative logistic normal class.- 6.14 Partitioned logistic normal classes.- 6.15 Some notation.- 6.16 Bibliographic notes.- Problems.- 7 Logratio analysis of compositions.- 7.1 Introduction.- 7.2 Estimation of ? and ?.- 7.3 Validation: tests of logistic normality.- 7.4 Hypothesis testing strategy and techniques.- 7.5 Testing hypotheses about ? and ?.- 7.6 Logratio linear modelling.- 7.7 Testing logratio linear hypotheses.- 7.8 Further aspects of logratio linear modelling.- 7.9 An application of logratio linear modelling.- 7.10 Predictive distributions, atypicality indices and outliers.- 7.11 Statistical discrimination.- 7.12 Conditional compositional modelling.- 7.13 Bibliographic notes.- Problems.- 8 Dimension-reducing techniques.- 8.1 Introduction.- 8.2 Crude principal component analysis.- 8.3 Logcontrast principal component analysis.- 8.4 Applications of logcontrast principal component analysis.- 8.5 Subcompositional analysis.- 8.6 Applications of subcompositional analysis.- 8.7 Canonical component analysis.- 8.8 Bibliographic notes.- Problems.- 9 Bases and compositions.- 9.1 Fundamentals.- 9.2 Covariance relationships.- 9.3 Principal and canonical component comparisons.- 9.4 Distributional relationships.- 9.5 Compositional invariance.- 9.6 An application to household budget analysis.- 9.7 An application to clinical biochemistry.- 9.8 Reappraisal of an early shape and size analysis.- 9.9 Bibliographic notes.- Problems.- 10 Subcompositions and partitions.- 10.1 Introduction.- 10.2 Complete subcompositional independence.- 10.3 Partitions of order 1.- 10.4 Ordered sequences of partitions.- 10.5 Caveat.- 10.6 Partitions of higher order.- 10.7 Bibliographic notes.- Problems.- 11 Irregular compositional data.- 11.1 Introduction.- 11.2 Modelling imprecision in compositions.- 11.3 Analysis of sources of imprecision.- 11.4 Imprecision and tests of independence.- 11.5 Rounded or trace zeros.- 11.6 Essential zeros.- 11.7 Missing components.- 11.8 Bibliographic notes.- Problems.- 12 Compositions in a covariate role.- 12.1 Introduction.- 12.2 Calibration.- 12.3 A before-and-after treatment problem.- 12.4 Experiments with mixtures.- 12.5 An application to firework mixtures.- 12.6 Classification from compositions.- 12.7 An application to geological classification.- 12.8 Bibliographic notes.- Problems.- 13 Further distributions on the simplex.- 13.1 Some generalizations of the Dirichlet class.- 13.2 Some generalizations of the logistic normal classes.- 13.3 Recapitulation.- 13.4 The Ad(?,B) class.- 13.5 Maximum likelihood estimation.- 13.6 Neutrality and partition independence.- 13.7 Subcompositional independence.- 13.8 A generalized lognormal gamma distribution with compositional in variance.- 13.9 Discussion.- 13.10 Bibliographic notes.- Problems.- 14 Miscellaneous problems.- 14.1 Introduction.- 14.2 Multi-way compositions.- 14.3 Multi-stage compositions.- 14.4 Multiple compositions.- 14.5 Kernel density estimation for compositional data.- 14.6 Compositional stochastic processes.- 14.7 Relation to Bayesian statistical analysis.- 14.8 Compositional and directional data.- Problems.- Appendices.- A Algebraic properties of elementary matrices.- B Bibliography.- C Computer software for compositional data analysis.- D Data sets.- Author index.

[1]  J. Huxley Problems of relative growth , 1932 .

[2]  R. Fisher The fitting of gene frequencies to data on rhesus reactions. , 1946, Annals of eugenics.

[3]  A. Krishnamoorthy,et al.  A Multivariate Gamma-Type Distribution , 1951 .

[4]  D. Cox 9—SOME STATISTICAL ASPECTS OF MIXING AND BLENDING , 1954 .

[5]  G. S. Watson,et al.  ANALYSIS OF DISPERSION ON A SPHERE , 1956 .

[6]  C. Leser Forms of Engel functions , 1963 .

[7]  D. Lindley The Bayesian Analysis of Contingency Tables , 1964 .

[8]  A. T. Miesch The Constant Sum Problem in Geochemistry , 1969 .

[9]  P. Holland,et al.  Simultaneous Estimation of Multinomial Cell Probabilities , 1973 .

[10]  P. Altham Discrete variable analysis for individuals grouped into families , 1976 .

[11]  N. Draper,et al.  A Mixtures Model with Inverse Terms , 1977 .

[12]  N. Draper,et al.  Designs in Three and Four Components For Mixtures Models With Inverse Terms , 1977 .

[13]  R. Plackett,et al.  Dirichlet models for square contingency tables , 1978 .

[14]  I. Lauder,et al.  Statistical diagnosis from imprecise data , 1979 .

[15]  J. Mosimann,et al.  NEW STATISTICAL METHODS FOR ALLOMETRY WITH APPLICATION TO FLORIDA RED‐WINGED BLACKBIRDS , 1979, Evolution; international journal of organic evolution.

[16]  M. Stephens The Von Mises Distribution in p-Dimensions with Applications. , 1980 .

[17]  J. Cornell Experiments with Mixtures: Designs, Models and the Analysis of Mixture Data , 1982 .

[18]  S. Shen A method for discriminating between models describing compositional data , 1982 .