A Bayesian hierarchical model of compositional data with zeros : classification and evidence evaluation of forensic glass

A Bayesian hierarchical model is proposed for modelling compositional data containing large concentrations of zeros. Two data transformations were used and compared: the commonly used additive log-ratio (alr) transformation for compositional data, and the square root of the compositional ratios. For this data the square root transformation was found to stabilise variability in the data better. The square root transformation also had no issues dealing with the large concentrations of zeros. To deal with the zeros, two different approaches have been implemented: the data augmentation approach and the composite model approach. The data augmentation approach treats any zero values as rounded zeros, i.e. traces of components below limits of detection, and updates those zero values with non-zero values. This is better than the simple approach of adding constant values to zeros as it reduces any artificial correlation produced by updating the zeros as part of the modelling procedure. However, due to the small detection limit it does not necessarily alleviate the problems of having a point mass very close to zero. The composite model approach treats any zero components as being absent from a composition. This is done by splitting the data into subsets according to the presence or absence of certain components to produce different data configurations that are then modelled separately. The models are applied to a database consisting of the elemental configurations of forensic glass fragments with many levels of variability and of various use types. The main purposes of the model are (i) to derive expressions for the posterior predictive probabilities of newly observed glass fragments to infer their use type (classification) and (ii) to compute the evidential value of glass fragments under two complementary propositions about their source (forensic evidence evaluation). Simulation studies using cross-validation are carried out to assess both model approaches, with both performing well at classifying glass fragments of use types bulb, headlamp and container, but less well so when classifying car and building windows. The composite model approach marginally outperforms the data augmentation approach at the classification task; both approaches have the edge over support vector machines (SVM). Both model approaches also perform well when evaluating the evidential value of glass fragments, with false negative and false positive error rates below 5%. The results from glass classification and evidence evaluation are an improvement over existing methods. Assessment of the models as part of the evidence evaluation simulation study also leads to a restriction being placed upon the reported strength of the value of this type of evidence. To prevent strong support in favour of the wrong proposition it is recommended that this glass evidence should provide, at most, moderately strong support in favour of a proposition. The classification and evidence evaluation procedures are implemented into an online web application, which outputs the corresponding results for a given set of elemental composition measurements. The web application contributes a quick and easy-to-use tool for forensic scientists that deal with this type of forensic evidence in real-life casework.

[1]  C. M. Jackson,et al.  Compositional Data Analysis of Some Alkaline Glasses , 2005 .

[2]  Chris Field,et al.  Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis , 2011 .

[3]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[4]  Grzegorz Zadora,et al.  A Two‐Level Model for Evidence Evaluation in the Presence of Zeros * , 2010, Journal of forensic sciences.

[5]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[6]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[7]  P. Green,et al.  Trans-dimensional Markov chain Monte Carlo , 2000 .

[8]  R. Royall On the Probability of Observing Misleading Statistical Evidence , 2000 .

[9]  William J. Browne,et al.  An illustration of the use of reparameterisation methods for improving MCMC efficiency in crossed random effect models , 2004 .

[10]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[11]  Estimation in compositional data analysis , 1991 .

[12]  Chris A. Glasbey,et al.  A latent Gaussian model for compositional data with zeros , 2008 .

[13]  Niko Brümmer,et al.  Measuring, refining and calibrating speaker and language information extracted from speech , 2010 .

[14]  Cidambi Srinivasan,et al.  Box–Cox transformations in the analysis of compositional data , 1991 .

[15]  Alan H. Welsh,et al.  Regression for compositional data by using distributions defined on the hypersphere , 2011 .

[16]  Josep Daunis-i-Estadella,et al.  Bayesian tools for count zeros in compositional data sets , 2008 .

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[19]  Ricky Ansell,et al.  Scale of conclusions for the value of evidence , 2012 .

[20]  Jane M. Fry,et al.  Compositional data analysis and zeros in micro data , 2000 .

[21]  Alan H. Welsh,et al.  Fitting Kent models to compositional data with small concentration , 2014, Stat. Comput..

[22]  V. Pawlowsky-Glahn,et al.  Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation , 2003 .

[23]  Tatiana Trejos,et al.  Sampling strategies for the analysis of glass fragments by LA-ICP-MS Part I. Micro-homogeneity study of glass and its application to the interpretation of forensic evidence. , 2005, Talanta.

[24]  G. Zadora Classification of Glass Fragments Based on Elemental Composition and Refractive Index * , 2009, Journal of forensic sciences.

[25]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[26]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[27]  C. Champod,et al.  The classification and discrimination of glass fragments using non destructive energy dispersive X-ray microfluorescence. , 2003, Forensic science international.

[28]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[29]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[30]  V. Pawlowsky-Glahn,et al.  Zero Replacement in Compositional Data Sets , 2000 .

[31]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[32]  Qiang Liu,et al.  A hyperspherical transformation forecasting model for compositional data , 2007, Eur. J. Oper. Res..

[33]  J A Lambert,et al.  The impact of the principles of evidence interpretation on the structure and content of statements. , 2000, Science & justice : journal of the Forensic Science Society.

[34]  M. Degroot Optimal Statistical Decisions , 1970 .

[35]  David Lucy,et al.  Introduction to Statistics for Forensic Scientists , 2005 .

[36]  Alan E. Gelfand,et al.  Spatial Regression Modeling for Compositional Data With Many Zeros , 2013 .

[37]  Franco Taroni,et al.  Statistics and the Evaluation of Evidence for Forensic Scientists , 2004 .

[38]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[39]  Gordon M Miskelly,et al.  Compositional data analysis for elemental data in forensic science. , 2009, Forensic science international.

[40]  William A. Link,et al.  On thinning of chains in MCMC , 2012 .

[41]  Hongbin Zha,et al.  Dirichlet component analysis: feature extraction for compositional data , 2008, ICML '08.

[42]  David J. Hand,et al.  Assessing the Performance of Classification Methods , 2012 .

[43]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[44]  John Bacon-Shone Modelling Structural Zeros in Compositional data , 2003 .

[45]  Josep Antoni Martín-Fernández,et al.  Rounded zeros: some practical aspects for compositional data , 2006, Geological Society, London, Special Publications.

[46]  C. Stewart Zero-inflated beta distribution for modeling the proportions in quantitative fatty acid signature analysis , 2013 .

[47]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[48]  David Lindley,et al.  A problem in forensic science , 1977 .

[49]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[50]  J. Aitchison,et al.  Possible solution of some essential zero problems in compositional data analysis , 2003 .

[51]  J. Aitchison On criteria for measures of compositional difference , 1992 .

[52]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[53]  R. A. Crovelli,et al.  An objective replacement method for censored geochemical data , 1993 .

[54]  Grzegorz Zadora,et al.  Transformations for compositional data with zeros with an application to forensic evidence evaluation , 2011 .

[55]  J. Palarea‐Albaladejo,et al.  A Parametric Approach for Dealing with Compositional Rounded Zeros , 2007 .

[56]  J. Aitchison,et al.  Compositional Data Analysis: Where Are We and Where Should We Be Heading? , 2003 .

[57]  D. Rubin,et al.  Parameter expansion to accelerate EM: The PX-EM algorithm , 1998 .

[58]  Joaquin Gonzalez-Rodriguez,et al.  Reliable support: Measuring calibration of likelihood ratios. , 2013, Forensic science international.

[59]  A. Gelfand,et al.  Efficient parametrisations for normal linear mixed models , 1995 .

[60]  C. McCulloch,et al.  Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter , 2011, 1201.1980.

[61]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .