The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference

Summary Recent studies show that soil animal count data are characterized by the presence of excess zeros and overdispersion, which violate the assumptions of standard statistical tests. Despite this, analyses have consisted of mainly non-parametric tests and log-normal least square regression (i.e. ANOVA). Failure to accommodate zero inflation in count data can result in biased estimation of ecological effects jeopardizing the integrity of the scientific inference. The objective of this study was to compare statistical models for the analysis of soil animal count data and suggest appropriate methods for estimating abundance. The log-normal regression model, linear mixed model (LMM), standard Poisson, Poisson with correction for overdispersion (PCO), negative binomial distribution (NBD), the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models were compared using 12 count data sets of earthworms, millipedes, centipedes, beetles, ants and termites from soils under the miombo woodland and agroforestry systems in eastern Zambia. The NBD with covariates gave a better description of the data in nine out of 12 cases than did the standard Poisson, ZIP and ZINB. The ZIP and ZINB models with covariates gave the best description of earthworm counts from the miombo and millipede counts from agroforestry, respectively. In all cases, the ZIP model was better than the standard Poisson model. The ZINB was inferior to the NBD except for earthworm counts from the miombo and millipede counts in agroforestry. Significance tests based on the PCO, ZIP, NBD and ZINB were more conservative than those based on the standard Poisson model. The 95% confidence intervals computed using the PCO, ZIP, NBD and ZINB were also wider than those computed using least squares, LMM and assuming Poisson distribution. It is concluded that for the comparison among habitat types, land-use categories or treatments, the NBD, ZIP and ZINB perform better than the log-normal and Poisson models. Considering the excess-zero problem and significant deviation of soil animal counts from the assumptions of normality and homoscedcity, the log-normal regression model is inappropriate. Therefore, routine application of the log-normal regression model and non-parametric tests for analysis of soil animal count data with many zeros should be discouraged.

[1]  D. Bignell,et al.  Standard methods for assessment of soil biodiversity and land use practice , 2001 .

[2]  Kohji Yamamura,et al.  Transformation using (x + 0.5) to stabilize the variance of populations , 1999, Researches on Population Ecology.

[3]  Elliott Sober,et al.  The contest between parsimony and likelihood. , 2004, Systematic biology.

[4]  S. R. Searle,et al.  Generalized, Linear, and Mixed Models , 2005 .

[5]  A. Neutel,et al.  Soil biodiversity and food webs , 2004 .

[6]  C. Dayton INVITED ARTICLES Model Comparisons Using Information Measures , 2003 .

[7]  D. Spiegelhalter,et al.  Bayes Factors and Choice Criteria for Linear Models , 1980 .

[8]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[9]  David I. Warton,et al.  Many zeros does not mean zero inflation: comparing the goodness‐of‐fit of parametric models to multivariate abundance data , 2005 .

[10]  Jerald B. Johnson,et al.  Model selection in ecology and evolution. , 2004, Trends in ecology & evolution.

[11]  A. Atkinson Likelihood ratios, posterior odds and information criteria , 1981 .

[12]  Fergus L. Sinclair,et al.  Trees, Crops and Soil Fertility: Concepts and Research Methods , 2003 .

[13]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[14]  David T. Jones,et al.  Methods for sampling termites. , 2007 .

[15]  P. Mafongoya,et al.  Variation in macrofaunal communities under contrasting land use systems in eastern Zambia , 2006 .

[16]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[17]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data: Preface , 1998 .

[18]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[19]  Golden,et al.  Statistical Tests for Comparing Possibly Misspecified and Nonnested Models. , 2000, Journal of mathematical psychology.

[20]  Michael Friendly,et al.  Discrete Distributions , 2005, Probability and Bayesian Modeling.

[21]  L. Sundström,et al.  Forest stand structure, site type and distribution of ant mounds in boreal forests in Finland in the 1950s , 2005 .

[22]  I. Baillie,et al.  Tropical Soil Biology and Fertility: A Handbook of Methods. , 1990 .

[23]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[24]  D. Lindenmayer,et al.  Modelling the abundance of rare species: statistical models for counts with extra zeros , 1996 .

[25]  W. Topp,et al.  Influence of deadwood on density of soil macro-arthropods in a managed oak–beech forest , 2004 .

[26]  J. Lawless Negative binomial and mixed Poisson regression , 1987 .

[27]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[28]  G. David Buntin,et al.  Handbook of Sampling Methods for Arthropods in Agriculture , 1994 .

[29]  Edgar Brunner,et al.  Nonparametric methods in factorial designs , 2001 .

[30]  David R. Cox,et al.  Some remarks on overdispersion , 1983 .

[31]  P. Mafongoya,et al.  Quantity and quality of organic inputs from coppicing leguminous trees influence abundance of soil macrofauna in maize crops in eastern Zambia , 2006, Biology and Fertility of Soils.

[32]  P. Mafongoya,et al.  The Short-term Impact of Forest Fire on Soil Invertebrates in the Miombo , 2006, Biodiversity & Conservation.

[33]  J. Kuha AIC and BIC , 2004 .

[34]  S. Leather Insect sampling in forest ecosystems. , 2005 .

[35]  Hugh P Possingham,et al.  Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. , 2005, Ecology letters.

[36]  G. Sileshi,et al.  Selecting the right statistical model for analysis of insect count data by using information theoretic measures , 2006, Bulletin of Entomological Research.

[37]  David B. Lindenmayer,et al.  MODELING COUNT DATA OF RARE SPECIES: SOME STATISTICAL ISSUES , 2005 .

[38]  C. Borror Generalized Linear Models and Extensions, Second Edition , 2008 .