A hierarchical model for compositional data analysis

This article introduces a hierarchical model for compositional analysis. Our approach models both source and mixture data simultaneously, and accounts for several different types of variation: these include measurement error on both the mixture and source data; variability in the sample from the source distributions; and variability in the mixing proportions themselves, generally of main interest. The method is an improvement on some existing methods in that estimates of mixing proportions (including their interval estimates) are sure to lie in the range [0, 1]; in addition, it is shown that our model can help in situations where identification of appropriate source data is difficult, especially when we extend our model to include a covariate. We first study the likelihood surface of a base model for a simple example, and then include prior distributions to create a Bayesian model that allows analysis of more complex situations via Markov chain Monte Carlo sampling from the likelihood. Application of the model is illustrated with two examples using real data: one concerning chemical markers in plants, and another on water chemistry.

[1]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[2]  P. Guttorp,et al.  Statistical Interpretation of Species Composition , 2001 .

[3]  Philip K. Hopke,et al.  Recent developments in receptor modeling , 2003 .

[4]  D. Genereux,et al.  Quantifying uncertainty in tracer‐based hydrograph separations , 1998 .

[5]  Mark J. Brewer,et al.  Identifying and assessing uncertainty in hydrological pathways: a novel approach to end member mixing in a Scottish agricultural catchment , 2003 .

[6]  Mark J. Brewer,et al.  A Bayesian Model for Compositional Data Analysis , 2002, COMPSTAT.

[7]  Peter Guttorp,et al.  Multivariate receptor models and model uncertainty , 2002 .

[8]  R. Renner The resolution of a compositional data set into mixtures of fixed source compositions , 1993 .

[9]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[10]  Ronald C. Henry,et al.  Bilinear estimation of pollution source profiles and amounts by using multivariate receptor models , 2002 .

[11]  Nicholas J. Aebischer,et al.  Compositional Analysis of Habitat Use From Animal Radio-Tracking Data , 1993 .

[12]  Robert W. Mayes,et al.  The use of dosed and herbage n-alkanes as markers for the determination of herbage intake , 1986, The Journal of Agricultural Science.

[13]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[14]  David D Billheimer,et al.  Compositional receptor modeling , 2001 .

[15]  D. Cox,et al.  Asymptotic techniques for use in statistics , 1989 .

[16]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[17]  David A. Elston,et al.  ASSESSMENT OF PREFERENCE AMONG A RANGE OF OPTIONS USING LOG RATIO ANALYSIS , 1996 .

[18]  P. Penning,et al.  Least-squares estimation of diet composition from n-alkanes in herbage and faeces using matrix mathematics , 1995 .