A Dirichlet Regression Model for Compositional Data with Zeros

Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for this problem exist, but most of them rely on substituting the zero values. In this paper we adjust the Dirichlet distribution when covariates are present, in order to allow for zero values to be present in the data, without modifying any values. To do so, we modify the log-likelihood of the Dirichlet distribution to account for zero values. Examples and simulation studies exhibit the performance of the zero adjusted Dirichlet regression.

[1]  Peter Filzmoser,et al.  Model-based replacement of rounded zeros in compositional data: Classical and robust approaches , 2012, Comput. Stat. Data Anal..

[2]  Daniel Zelterman,et al.  Dirichlet component regression and its applications to psychiatric data , 2008, Comput. Stat. Data Anal..

[3]  Chris A. Glasbey,et al.  A latent Gaussian model for compositional data with zeros , 2008 .

[4]  Alan H. Welsh,et al.  Regression for compositional data by using distributions defined on the hypersphere , 2011 .

[5]  Juan M. C. Larrosa A compositional statistical analysis of capital stock , 2003 .

[6]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[7]  Richard L. Smith,et al.  A statistical assessment of Buchanan's vote in Palm Beach County , 2002 .

[8]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[9]  I. Jolliffe Principal Component Analysis , 2002 .

[10]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[11]  J. Bear,et al.  A Logistic Normal Mixture Model for Compositional Data Allowing Essential Zeros , 2016 .

[12]  Chris Field,et al.  Managing the Essential Zeros in Quantitative Fatty Acid Signature Analysis , 2011 .

[13]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[14]  Henri Theil,et al.  Economics and information theory , 1967 .

[15]  C. Gouriéroux,et al.  PSEUDO MAXIMUM LIKELIHOOD METHODS: THEORY , 1984 .

[16]  G. Tian,et al.  Dirichlet and Related Distributions: Theory, Methods and Applications , 2011 .

[17]  Raydonal Ospina,et al.  Inflated beta distributions , 2007, 0705.0700.

[18]  Grzegorz Zadora,et al.  A Two‐Level Model for Evidence Evaluation in the Presence of Zeros * , 2010, Journal of forensic sciences.

[19]  I. Vajda,et al.  A new class of metric divergences on probability spaces and its applicability in statistics , 2003 .

[20]  José M. R. Murteira,et al.  Regression Analysis of Multivariate Fractional Data , 2016 .

[21]  Michael A. Stephens,et al.  Use of the von Mises distribution to analyse continuous proportions , 1982 .

[22]  Alan E. Gelfand,et al.  Spatial Regression Modeling for Compositional Data With Many Zeros , 2013 .

[23]  P. Davis Leonhard Euler's Integral: A Historical Profile of the Gamma Function: In Memoriam: Milton Abramowitz , 1959 .

[24]  J. A. Martín-Fernández,et al.  A modified EM alr-algorithm for replacing rounded zeros in compositional data sets , 2008, Comput. Geosci..

[25]  Rafiq H. Hijazi An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models , 2011 .

[26]  Raflq H. Hijazi,et al.  Modelling Compositional Data Using Dirichlet Regression Models , 2007 .