Imputation of values above an upper detection limit in compositional data

Abstract Geochemical data frequently contain censored values. An imputation method for right-censored compositional data is proposed, based on the Tobit model, in order to get a complete and reliable data set. An algorithm is developed and implemented using regressions in an iterative scheme, where the imputed values are updated step-by-step. Optionally, classical least-squares or robust regressions can be carried out, with or without variable selection. The performance of the algorithm is evaluated using two real geochemical data sets, considering various different scenarios. Compared to commonly used substitution methods, the proposed method leads to an improved data quality. The procedure is available in the R package robCompositions.

[1]  P. Filzmoser,et al.  The response of 12 different plant materials and one mushroom to Mo and Pb mineralization along a 100-km transect in southern central Norway , 2018, Geochemistry: Exploration, Environment, Analysis.

[2]  C. Barceló-Vidal,et al.  The mathematics of compositional analysis , 2016 .

[3]  Javier Palarea-Albaladejo,et al.  zCompositions — R package for multivariate imputation of left-censored data under a compositional approach , 2015 .

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[6]  R. A. Crovelli,et al.  An objective replacement method for censored geochemical data , 1993 .

[7]  Dennis R. Helsel,et al.  Statistics for Censored Environmental Data Using Minitab and R , 2012 .

[8]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[9]  P. Filzmoser,et al.  Imputation of rounded zeros for high-dimensional compositional data , 2016 .

[10]  Clemens Reimann,et al.  Interpretation of multivariate outliers for compositional data , 2012, Comput. Geosci..

[11]  Wendelin Schnedler,et al.  Likelihood Estimation for Censored Random Vectors , 2005 .

[12]  Antonella Buccianti,et al.  Is compositional data analysis a way to see beyond the illusion? , 2013, Comput. Geosci..

[13]  K. Hron,et al.  On the Interpretation of Orthonormal Coordinates for Compositional Data , 2011 .

[14]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[15]  V. Pawlowsky-Glahn,et al.  Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation , 2003 .

[16]  J. Egozcue Reply to “On the Harker Variation Diagrams; …” by J.A. Cortés , 2009 .

[17]  V. Pawlowsky-Glahn,et al.  Geometric approach to statistical analysis on the simplex , 2001 .

[18]  J. Aitchison,et al.  Biplots of Compositional Data , 2002 .

[19]  J. A. Martín-Fernández,et al.  A modified EM alr-algorithm for replacing rounded zeros in compositional data sets , 2008, Comput. Geosci..

[20]  Clemens Reimann,et al.  Robust factor analysis for compositional data , 2009, Comput. Geosci..

[21]  V. Pawlowsky-Glahn,et al.  Modeling and Analysis of Compositional Data , 2015 .

[22]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[23]  Peter Filzmoser,et al.  Model-based replacement of rounded zeros in compositional data: Classical and robust approaches , 2012, Comput. Stat. Data Anal..

[24]  Raimon Tolosana-Delgado,et al.  Joint simulation of compositional and categorical data via direct sampling technique - Application to improve mineral resource confidence , 2019, Comput. Geosci..

[25]  R. Olea,et al.  Dealing with Zeros , 2011 .

[26]  Raimon Tolosana-Delgado,et al.  "compositions": A unified R package to analyze compositional data , 2008, Comput. Geosci..