Count transformation models

The effect of explanatory environmental variables on a species' distribution is often assessed using a count regression model. Poisson generalized linear models or negative binomial models are common, but the traditional approach of modelling the mean after log or square root transformation remains popular and in some cases is even advocated. We propose a novel framework of linear models for count data. Similar to the traditional approach, the new models apply a transformation to count responses; however, this transformation is estimated from the data and not defined a priori. In contrast to simple least‐squares fitting and in line with Poisson or negative binomial models, the exact discrete likelihood is optimized for parameter estimation and inference. Simple interpretation of effects in the linear predictors is possible. Count transformation models provide a new approach to regressing count data in a distribution‐free yet fully parametric fashion, obviating the need to a priori commit to a specific parametric family of distributions or to a specific transformation. The models are a generalization of discrete Weibull models for counts and are thus able to handle over‐ and underdispersion. We demonstrate empirically that the models are more flexible than Poisson or negative binomial models but still maintain interpretability of multiplicative effects. A re‐analysis of deer–vehicle collisions and the results of artificial simulation experiments provide evidence of the practical applicability of the model framework. In ecology studies, uncertainties regarding whether and how to transform count data can be resolved in the framework of count transformation models, which were designed to simultaneously estimate an appropriate transformation and the linear effects of environmental variables by maximizing the exact count log‐likelihood. The application of data‐driven transformations allows over‐ and underdispersion to be addressed in a model‐based approach. Models in this class can be compared to Poisson or negative binomial models using the in‐ or out‐of‐sample log‐likelihood. Extensions to nonlinear additive or interaction effects, correlated observations, hurdle‐type models and other more complex situations are possible. A free software implementation is available in the cotram add‐on package to the R system for statistical computing.

[1]  Torsten Hothorn,et al.  Most Likely Transformations: The mlt Package , 2020, Journal of Statistical Software.

[2]  T. Hothorn Marginally Interpretable Linear Transformation Models for Clustered Observations , 2019 .

[3]  T. Hothorn,et al.  Multivariate conditional transformation models , 2019, Scandinavian Journal of Statistics.

[4]  Torsten Hothorn,et al.  Transformation boosting machines , 2019, Statistics and Computing.

[5]  M. D. de Felipe,et al.  Environmental factors influencing road use in a nocturnal insectivorous bird , 2019, European Journal of Wildlife Research.

[6]  Haavard Rue,et al.  Model-aware Quantile Regression for Discrete Data , 2018, 1804.03714.

[7]  D. Warton Why you cannot transform your way out of trouble for small counts , 2018, Biometrics.

[8]  D. Schneider,et al.  Count data in biology—Data transformation or model reformation? , 2018, Ecology and evolution.

[9]  Keming Yu,et al.  Discrete Weibull generalized additive model: an application to count fertility data , 2018, Journal of the Royal Statistical Society: Series C (Applied Statistics).

[10]  James S. Clark,et al.  Generalized joint attribute modeling for biodiversity analysis: median-zero, multivariate, multifarious data , 2017 .

[11]  Anthony R. Ives,et al.  Three points to consider when choosing a LM or GLM test for count data , 2016 .

[12]  Torsten Hothorn,et al.  Most Likely Transformations , 2015, 1508.06749.

[13]  Torsten Hothorn,et al.  Temporal patterns of deer-vehicle collisions consistent with deer activity pattern and density increase but not general accident risk. , 2015, Accident; analysis and prevention.

[14]  A. Ives For testing the significance of regression coefficients, go ahead and log‐transform count data , 2015 .

[15]  T. Hothorn Models for Temporal Patterns of Deer-vehicle Collisions in Bavaria, Germany , 2015 .

[16]  L. Held,et al.  Modeling seasonality in space‐time infectious disease surveillance data , 2012, Biometrical journal. Biometrische Zeitschrift.

[17]  Rida T. Farouki,et al.  The Bernstein polynomial basis: A centennial retrospective , 2012, Comput. Aided Geom. Des..

[18]  Torsten Hothorn,et al.  Conditional transformation models , 2012, 1201.5786.

[19]  Robert B. O'Hara,et al.  Do not log‐transform count data , 2010 .

[20]  Izabela E Annis PROC SQL: Beyond the Basics Using SAS , 2005 .

[21]  T. Nakagawa,et al.  The Discrete Weibull Distribution , 1975, IEEE Transactions on Reliability.

[22]  J. Nelder,et al.  Generalized Linear Models , 1972 .

[23]  A. Dean,et al.  01. Design and Analysis of Experiments , 2017 .

[24]  K. Mooney,et al.  Abiotic mediation of a mutualism drives herbivore abundance. , 2016, Ecology letters.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[27]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .