Boosted Beta Regression

Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

[1]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[2]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[3]  S. Hobbie,et al.  Estimating Litter Decomposition Rate in Single-Pool Models Using Nonlinear Beta Regression , 2012, PloS one.

[4]  Gerhard Tutz,et al.  Variable Selection and Model Choice in Geoadditive Regression Models , 2009, Biometrics.

[5]  Robert P Freckleton,et al.  Why do we still use stepwise modelling in ecology and behaviour? , 2006, The Journal of animal ecology.

[6]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[7]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  C. Rahbek The Relationship Among Area, Elevation, And Regional Species Richness In Neotropical Birds , 1997, The American Naturalist.

[10]  A. Korhola,et al.  The distribution and diversity of Chironomidae (Insecta: Diptera) in western Finnish Lapland, with special emphasis on shallow lakes , 2005 .

[11]  Torsten Hothorn,et al.  Geoadditive regression modeling of stream biological condition , 2010, Environmental and Ecological Statistics.

[12]  J. Feminella Correspondence between stream macroinvertebrate assemblages and 4 ecoregions of the southeastern USA , 2000, Journal of the North American Benthological Society.

[13]  Alan E. Gelfand Guest Editorial: Spatial and spatio-temporal modeling in environmental and ecological statistics , 2007, Environmental and Ecological Statistics.

[14]  M. Barbour,et al.  Rapid bioassessment protocols for use in streams and wadeable rivers: periphyton , 1999 .

[15]  Debashis Kushary,et al.  Bootstrap Methods and Their Application , 2000, Technometrics.

[16]  W. Morton,et al.  Influence of Nutrients in Water and Sediments on the Spatial Distributions of Benthos in Lake Simcoe , 2008 .

[17]  J. Feminella,et al.  Evaluation of single- and multi-metric benthic macroinvertebrate indicators of catchment disturbance over time at the Fort Benning Military Installation, Georgia, USA , 2006 .

[18]  M. Gillings,et al.  Individual Variability in Reproductive Success Determines Winners and Losers under Ocean Acidification: A Case Study with Sea Urchins , 2012, PloS one.

[19]  G. Maddala Limited-dependent and qualitative variables in econometrics: Introduction , 1983 .

[20]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[21]  D. Vodopich,et al.  Production by Hexagenia limbata in a warm-water reservoir and its association with chlorophyll content of the water column , 1989, Hydrobiologia.

[23]  Charles P. Hawkins,et al.  Evaluation of the use of landscape classifications for the prediction of freshwater biota: synthesis and recommendations , 2000, Journal of the North American Benthological Society.

[24]  P. Groffman,et al.  The urban stream syndrome: current knowledge and the search for a cure , 2005, Journal of the North American Benthological Society.

[25]  Ranz,et al.  World Map of the Köppen-Geiger climate classification updated — Source link , 2006 .

[26]  Wagner Barreto-Souza,et al.  Improved estimators for a general class of beta regression models , 2008, Comput. Stat. Data Anal..

[27]  Shane A. Richards,et al.  Dealing with overdispersed count data in applied ecology , 2007 .

[28]  J. Catalán,et al.  Lake macroinvertebrates and the altitudinal environmental gradient in the Pyrenees , 2010, Hydrobiologia.

[29]  Torsten Hothorn,et al.  Estimation and regularization techniques for regression models with multidimensional prediction functions , 2010, Stat. Comput..

[30]  Leslie E. Papke,et al.  Econometric Methods for Fractional Response Variables with an Application to 401(K) Plan Participation Rates , 1993 .

[31]  B. Rossaro,et al.  A biotic index using benthic macroinvertebrates for Italian lakes , 2007 .

[32]  B. McCullough,et al.  Regression analysis of variates observed on (0, 1): percentages, proportions and fractions , 2003 .

[33]  J. Karr,et al.  Restoring life in running waters : better biological monitoring , 1998 .

[34]  Adrian Bowman,et al.  Generalized additive models for location, scale and shape - Discussion , 2005 .

[35]  A. Zeileis,et al.  Beta Regression in R , 2010 .

[36]  Claudia Bolognesi,et al.  Urinary Benzene Biomarkers and DNA Methylation in Bulgarian Petrochemical Workers: Study Findings and Comparison of Linear and Beta Regression Models , 2012, PloS one.

[37]  David Gilvear,et al.  James R. Karr and Ellen W. Chu, Restoring Life in Running Waters: Better Biological Monitoring , 1999 .

[38]  G. Thor,et al.  Estimating Coextinction Risks from Epidemic Tree Death: Affiliate Lichen Communities among Diseased Host Tree Populations of Fraxinus excelsior , 2012, PloS one.

[39]  T. McMahon,et al.  Updated world map of the Köppen-Geiger climate classification , 2007 .

[40]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[41]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[42]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[43]  L. Cloutier,et al.  Spatial structure of the insect community of a small dimictic Lake in the Laurentians (Québec) , 1986 .

[44]  Matthias Hunger,et al.  Analysis of SF-6D index data: is beta regression appropriate? , 2011, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[45]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[46]  B. D. Ripley,et al.  SELECTING AMONGST LARGE CLASSES OF MODELS , 2004 .

[47]  C. Lindegaard,et al.  The fauna in the upper stony littoral of Danish lakes: macroinvertebrates as trophic indicators , 1998 .

[48]  N. S. Urquhart,et al.  Predicting Water Quality Impaired Stream Segments using Landscape-Scale Data and a Regional Geostatistical Model: A Case Study in Maryland , 2006, Environmental monitoring and assessment.

[49]  Michael Smithson,et al.  A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. , 2006, Psychological methods.

[50]  Benjamin Hofner,et al.  Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting , 2012 .

[51]  Matthias Schmid,et al.  Applying additive modelling and gradient boosting to assess the effects of watershed and reach characteristics on riverine assemblages , 2012 .

[52]  J. Heino Lentic macroinvertebrate assemblage structure along gradients in spatial heterogeneity, habitat size and water chemistry , 2004, Hydrobiologia.

[53]  E. Bruna,et al.  Changes in Tree Reproductive Traits Reduce Functional Diversity in a Fragmented Atlantic Forest Landscape , 2007, PloS one.

[54]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[55]  C. Cox,et al.  Nonlinear quasi-likelihood models: applications to continuous proportions , 1996 .

[56]  B. Yu,et al.  Boosting with the L 2-loss regression and classification , 2001 .

[57]  Francis K C Hui,et al.  The arcsine is asinine: the analysis of proportions in ecology. , 2011, Ecology.

[58]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[59]  S. Ferrari,et al.  Beta Regression for Modelling Rates and Proportions , 2004 .

[60]  Kurt Hornik,et al.  The Design and Analysis of Benchmark Experiments , 2005 .