Ordinal regression models for zero-inflated and/or over-dispersed count data

Count data commonly arise in natural sciences but adequately modeling these data is challenging due to zero-inflation and over-dispersion. While multiple parametric modeling approaches have been proposed, unfortunately there is no consensus regarding how to choose the best model. In this article, we propose a ordinal regression model (MN) as a default model for count data given that this model is shown to fit well data that arise from several types of discrete distributions. We extend this model to allow for automatic model selection (MN-MS) and show that the MN-MS model generates superior inference when compared to using the full model or more traditional model selection approaches. The MN-MS model is used to determine how human biting rate of mosquitoes, known to be able to transmit malaria, are influenced by environmental factors in the Peruvian Amazon. The MN-MS model had one of the best fit and out-of-sample predictive skill amongst all models. While A. darlingi is strongly associated with highly anthropized landscapes, all the other mosquito species had higher mean biting rates in landscapes with a lower fraction of exposed soil and urban area, revealing a striking shift in species composition. We believe that the MN and MN-MS models are valuable additions to the modelling toolkit employed by environmental modelers and quantitative ecologists.

[1]  E Roux,et al.  Unravelling the relationships between Anopheles darlingi (Diptera: Culicidae) densities, environmental factors and malaria incidence: understanding the variable patterns of malarial transmission in French Guiana (South America) , 2011, Annals of tropical medicine and parasitology.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  G. Sileshi,et al.  Traditional occupancy–abundance models are inadequate for zero-inflated ecological count data , 2009 .

[4]  A. Cunha,et al.  DISTRIBUIÇÃO MENSAL E ATIVIDADE HORÁRIA DE Anopheles (DIPTERA: CULICIDAE) EM UMA ÁREA RURAL DA AMAZÔNIA ORIENTAL. , 2013 .

[5]  Christopher Holmes,et al.  Bayesian Methods for Nonlinear Classification and Regressing , 2002 .

[6]  S. Wood Generalized Additive Models: An Introduction with R, Second Edition , 2017 .

[7]  Harry Joe,et al.  Generalized Poisson Distribution: the Property of Mixture of Poisson and Comparison with Negative Binomial Distribution , 2005, Biometrical journal. Biometrische Zeitschrift.

[8]  S. Rifai,et al.  Does deforestation promote or inhibit malaria transmission in the Amazon? A systematic literature review and critical appraisal of current evidence , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Ken Aho,et al.  Model selection for ecologists: the worldviews of AIC and BIC. , 2014, Ecology.

[10]  L. P. Lounibos,et al.  Malaria vector incrimination in three rural riverine villages in the Brazilian Amazon. , 2007, The American journal of tropical medicine and hygiene.

[11]  Robert H Gilman,et al.  The effect of deforestation on the human-biting rate of Anopheles darlingi, the primary vector of Falciparum malaria in the Peruvian Amazon. , 2006, The American journal of tropical medicine and hygiene.

[12]  Bani K. Mallick,et al.  Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection , 2013 .

[13]  Jane Elith,et al.  Comparing species abundance models , 2006 .

[14]  Hugh P Possingham,et al.  Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. , 2005, Ecology letters.

[15]  David I. Warton,et al.  Many zeros does not mean zero inflation: comparing the goodness‐of‐fit of parametric models to multivariate abundance data , 2005 .

[16]  James W. Jones,et al.  Seasonal Distribution, Biology, and Human Attraction Patterns of Mosquitoes (Diptera: Culicidae) in a Rural Village and Adjacent Forested Site Near Iquitos, Peru , 2008, Journal of medical entomology.

[17]  W. Tadei,et al.  Malaria vectors in the Brazilian amazon: Anopheles of the subgenus Nyssorhynchus. , 2000, Revista do Instituto de Medicina Tropical de Sao Paulo.

[18]  G. White,et al.  Analysis of Frequency Count Data Using the Negative Binomial Distribution , 1996 .

[19]  A. Agresti Categorical data analysis , 1993 .

[20]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[21]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[22]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[23]  Justin Millar,et al.  Detecting local risk factors for residual malaria in northern Ghana using Bayesian model averaging , 2018, Malaria Journal.

[24]  Anna Genell,et al.  Model selection in Medical Research: A simulation study comparing Bayesian Model Averaging and Stepwise Regression , 2010, BMC medical research methodology.

[25]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[26]  A. Welsh,et al.  Methodology for Estimating the Abundance of Rare Animals: Seabird Nesting on North East Herald Cay , 2000, Biometrics.

[27]  M. Póvoa,et al.  Biting indices, host-seeking activity and natural infection rates of anopheline species in Boa Vista, Roraima, Brazil from 1996 to 1998. , 2002, Memorias do Instituto Oswaldo Cruz.

[28]  J. V. Ver Hoef,et al.  Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? , 2007, Ecology.

[29]  C. Schlichting,et al.  Emergence of a new neotropical malaria vector facilitated by human migration and changes in land use. , 2002, The American journal of tropical medicine and hygiene.

[30]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[31]  Kai Zhu,et al.  The k‐ZIG: Flexible Modeling for Zero‐Inflated Counts , 2012, Biometrics.

[32]  C. Flores-Mendoza,et al.  Geographical distribution of Anopheles darlingi in the Amazon Basin region of Peru. , 2003, Journal of the American Mosquito Control Association.

[33]  Daniel Coppard,et al.  Quantitative Review , 2020, Encyclopedia of Personality and Individual Differences.

[34]  Dominique Lord,et al.  Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. , 2005, Accident; analysis and prevention.

[35]  J. Nedelman A negative binomial model for sampling mosquitoes in a malaria survey. , 1983, Biometrics.

[36]  L. P. Lounibos,et al.  Malaria Vector Heterogeneity in South America , 2000 .

[37]  M. Póvoa,et al.  Malaria vectors in the municipality of Serra do Navio, State of Amapá, Amazon Region, Brazil. , 2001, Memorias do Instituto Oswaldo Cruz.

[38]  D. Lindenmayer,et al.  Modelling the abundance of rare species: statistical models for counts with extra zeros , 1996 .

[39]  Andreas Lindén,et al.  Using the negative binomial distribution to model overdispersion in ecological count data. , 2011, Ecology.

[40]  A. Dobson,et al.  Patterns of macroparasite abundance and aggregation in wildlife populations: a quantitative review , 1995, Parasitology.

[41]  R Moyeed,et al.  Spatial modelling of individual-level parasite counts using the negative binomial distribution. , 2000, Biostatistics.