The Statistical Analysis of Compositional Data

1 Compositional data: some challenging problems.- 1.1 Introduction.- 1.2 Geochemical compositions of rocks.- 1.3 Sediments at different depths.- 1.4 Ternary diagrams.- 1.5 Partial analyses and subcompositions.- 1.6 Supervisory behaviour.- 1.7 Household budget surveys.- 1.8 Steroid metabolite patterns in adults and children.- 1.9 Activity patterns of a statistician.- 1.10 Calibration of white-cell compositions.- 1.11 Fruit evaluation.- 1.12 Firework mixtures.- 1.13 Clam ecology.- 1.14 Bibliographic notes.- Problems.- 2 The simplex as sample space.- 2.1 Choice of sample space.- 2.2 Compositions and simplexes.- 2.3 Spaces, vectors, matrices.- 2.4 Bases and compositions.- 2.5 Subcompositions.- 2.6 Amalgamations.- 2.7 Partitions.- 2.8 Perturbations.- 2.9 Geometrical representations of compositional data.- 2.10 Bibliographic notes.- Problems.- 3 The special difficulties of compositional data analysis.- 3.1 Introduction.- 3.2 High dimensionality.- 3.3 Absence of an interpretable covariance structure.- 3.4 Difficulty of parametric modelling.- 3.5 The mixture variation difficulty.- 3.6 Bibliographic notes.- Problems.- 4 Covariance structure.- 4.1 Fundamentals.- 4.2 Specification of the covariance structure.- 4.3 The compositional variation array.- 4.4 Recovery of the compositional variation array from the crude mean vector and covariance matrix.- 4.5 Subcompositional analysis.- 4.6 Matrix specifications of covariance structures.- 4.7 Some important elementary matrices.- 4.8 Relationships between the matrix specifications.- 4.9 Estimated matrices for hongite compositions.- 4.10 Logratios and logcontrasts.- 4.11 Covariance structure of a basis.- 4.12 Commentary.- 4.13 Bibliographic notes.- Problems.- 5 Properties of matrix covariance specifications.- 5.1 Logratio notation.- 5.2 Logcontrast variances and covariances.- 5.3 Permutations.- 5.4 Properties of P and QP matrices.- 5.5 Permutation invariants involving ?.- 5.6 Covariance matrix inverses.- 5.7 Subcompositions.- 5.8 Equivalence of characteristics of ?, ?, ?.- 5.9 Logratio-uncorrelated compositions.- 5.10 Isotropic covariance structures.- 5.11 Bibliographic notes.- Problems.- 6 Logistic normal distributions on the simplex.- 6.1 Introduction.- 6.2 The additive logistic normal class.- 6.3 Density function.- 6.4 Moment properties.- 6.5 Composition of a lognormal basis.- 6.6 Class-preserving properties.- 6.7 Conditional subcompositional properties.- 6.8 Perturbation properties.- 6.9 A central limit theorem.- 6.10 A characterization by logcontrasts.- 6.11 Relationships with the Dirichlet class.- 6.12 Potential for statistical analysis.- 6.13 The multiplicative logistic normal class.- 6.14 Partitioned logistic normal classes.- 6.15 Some notation.- 6.16 Bibliographic notes.- Problems.- 7 Logratio analysis of compositions.- 7.1 Introduction.- 7.2 Estimation of ? and ?.- 7.3 Validation: tests of logistic normality.- 7.4 Hypothesis testing strategy and techniques.- 7.5 Testing hypotheses about ? and ?.- 7.6 Logratio linear modelling.- 7.7 Testing logratio linear hypotheses.- 7.8 Further aspects of logratio linear modelling.- 7.9 An application of logratio linear modelling.- 7.10 Predictive distributions, atypicality indices and outliers.- 7.11 Statistical discrimination.- 7.12 Conditional compositional modelling.- 7.13 Bibliographic notes.- Problems.- 8 Dimension-reducing techniques.- 8.1 Introduction.- 8.2 Crude principal component analysis.- 8.3 Logcontrast principal component analysis.- 8.4 Applications of logcontrast principal component analysis.- 8.5 Subcompositional analysis.- 8.6 Applications of subcompositional analysis.- 8.7 Canonical component analysis.- 8.8 Bibliographic notes.- Problems.- 9 Bases and compositions.- 9.1 Fundamentals.- 9.2 Covariance relationships.- 9.3 Principal and canonical component comparisons.- 9.4 Distributional relationships.- 9.5 Compositional invariance.- 9.6 An application to household budget analysis.- 9.7 An application to clinical biochemistry.- 9.8 Reappraisal of an early shape and size analysis.- 9.9 Bibliographic notes.- Problems.- 10 Subcompositions and partitions.- 10.1 Introduction.- 10.2 Complete subcompositional independence.- 10.3 Partitions of order 1.- 10.4 Ordered sequences of partitions.- 10.5 Caveat.- 10.6 Partitions of higher order.- 10.7 Bibliographic notes.- Problems.- 11 Irregular compositional data.- 11.1 Introduction.- 11.2 Modelling imprecision in compositions.- 11.3 Analysis of sources of imprecision.- 11.4 Imprecision and tests of independence.- 11.5 Rounded or trace zeros.- 11.6 Essential zeros.- 11.7 Missing components.- 11.8 Bibliographic notes.- Problems.- 12 Compositions in a covariate role.- 12.1 Introduction.- 12.2 Calibration.- 12.3 A before-and-after treatment problem.- 12.4 Experiments with mixtures.- 12.5 An application to firework mixtures.- 12.6 Classification from compositions.- 12.7 An application to geological classification.- 12.8 Bibliographic notes.- Problems.- 13 Further distributions on the simplex.- 13.1 Some generalizations of the Dirichlet class.- 13.2 Some generalizations of the logistic normal classes.- 13.3 Recapitulation.- 13.4 The Ad(?,B) class.- 13.5 Maximum likelihood estimation.- 13.6 Neutrality and partition independence.- 13.7 Subcompositional independence.- 13.8 A generalized lognormal gamma distribution with compositional in variance.- 13.9 Discussion.- 13.10 Bibliographic notes.- Problems.- 14 Miscellaneous problems.- 14.1 Introduction.- 14.2 Multi-way compositions.- 14.3 Multi-stage compositions.- 14.4 Multiple compositions.- 14.5 Kernel density estimation for compositional data.- 14.6 Compositional stochastic processes.- 14.7 Relation to Bayesian statistical analysis.- 14.8 Compositional and directional data.- Problems.- Appendices.- A Algebraic properties of elementary matrices.- B Bibliography.- C Computer software for compositional data analysis.- D Data sets.- Author index.

[1]  S. Shen A method for discriminating between models describing compositional data , 1982 .

[2]  Michael A. Stephens,et al.  Use of the von Mises distribution to analyse continuous proportions , 1982 .

[3]  J. Cornell Experiments with Mixtures: Designs, Models and the Analysis of Mixture Data , 1982 .

[4]  J. Aitchison A new approach to null correlations of proportions , 1981 .

[5]  J. Aitchison,et al.  Some Distribution Theory Related to the Analysis of Subjective Performance in Inferential Tasks , 1981 .

[6]  John Aitchison,et al.  Distributions on the Simplex for the Analysis of Neutrality , 1981 .

[7]  I. James Distributions Associated with Neutrality Properties for Random Proportions , 1981 .

[8]  M. Stephens The Von Mises Distribution in p-Dimensions with Applications. , 1980 .

[9]  J. Atchison,et al.  Logistic-normal distributions:Some properties and uses , 1980 .

[10]  J. Mosimann,et al.  A New Characterization of the Dirichlet Distribution Through Neutrality , 1980 .

[11]  I. Lauder,et al.  Statistical diagnosis from imprecise data , 1979 .

[12]  J. Mosimann,et al.  NEW STATISTICAL METHODS FOR ALLOMETRY WITH APPLICATION TO FLORIDA RED‐WINGED BLACKBIRDS , 1979, Evolution; international journal of organic evolution.

[13]  A. Deaton Specification and Testing in Applied Demand Analysis , 1978 .

[14]  D. Ratcliff,et al.  No-association of proportions , 1978 .

[15]  R. Plackett,et al.  Dirichlet models for square contingency tables , 1978 .

[16]  N. Draper,et al.  Designs in Three and Four Components For Mixtures Models With Inverse Terms , 1977 .

[17]  N. Draper,et al.  A Mixtures Model with Inverse Terms , 1977 .

[18]  John Aitchison,et al.  Statistical diagnosis when basic cases are not classified with certainty , 1976 .

[19]  P. Altham Discrete variable analysis for individuals grouped into families , 1976 .

[20]  I. James Multivariate Distributions Which Have Beta Conditional Distributions , 1975 .

[21]  J. Mosimann Statistical Problems of Size and Shape. II. Characterizations of the Lognormal, Gamma and Dirichlet Distributions , 1975 .

[22]  J. Mosimann Statistical Problems of Size and Shape. I. Biological Applications and Basic Theorems , 1975 .

[23]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[24]  P. Holland,et al.  Simultaneous Estimation of Multinomial Cell Probabilities , 1973 .

[25]  Tom Leonard Bayesian methods for binomial data , 1972 .

[26]  A. Deaton,et al.  Surveys in Applied Economics: Models of Consumer Behaviour , 1972 .

[27]  R. Thompson,et al.  Major Element Chemical Variation in the Eocene Lavas of the Isle of Skye, Scotland , 1972 .

[28]  J. Anderson Separate sample logistic discrimination , 1972 .

[29]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[30]  J. Darroch,et al.  A Characterization of the Dirichlet Distribution , 1971 .

[31]  J. Mosimann Size Allometry: Size and Shape Variables with Characterizations of the Lognormal and Generalized Gamma Distributions , 1970 .

[32]  Robert J. Connor,et al.  Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution , 1969 .

[33]  A. T. Miesch The Constant Sum Problem in Geochemistry , 1969 .

[34]  Day Ne,et al.  A GENERAL MAXIMUM LIKELIHOOD DISCRIMINANT , 1967 .

[35]  Felix Chayes,et al.  An Approximate Statistical Test for Correlations between Proportions , 1966, The Journal of Geology.

[36]  D. Lindley The Bayesian Analysis of Contingency Tables , 1964 .

[37]  C. Leser Forms of Engel functions , 1963 .

[38]  F. Chayes,et al.  Numerical Correlation and Petrographic Variation , 1962, The Journal of Geology.

[39]  David R. Cox,et al.  Further Results on Tests of Separate Families of Hypotheses , 1962 .

[40]  J. Mosimann On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions , 1962 .

[41]  F. Chayes On correlation between variables of constant sum , 1960 .

[42]  G. S. Watson,et al.  ANALYSIS OF DISPERSION ON A SPHERE , 1956 .

[43]  J. Aitchison On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin , 1955 .

[44]  D. Cox 9—SOME STATISTICAL ASPECTS OF MIXING AND BLENDING , 1954 .

[45]  A. Krishnamoorthy,et al.  A Multivariate Gamma-Type Distribution , 1951 .

[46]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[47]  R. Fisher The fitting of gene frequencies to data on rhesus reactions. , 1946, Annals of eugenics.

[48]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[49]  C. Pantin Problems of Relative Growth , 1932, Nature.

[50]  J. Huxley Problems of relative growth , 1932 .

[51]  K. Pearson Mathematical contributions to the theory of evolution.—On a form of spurious correlation which may arise when indices are used in the measurement of organs , 1897, Proceedings of the Royal Society of London.

[52]  D. Mcalister,et al.  XIII. The law of the geometric mean , 1879, Proceedings of the Royal Society of London.