A unified framework of constrained regression

Generalized additive models (GAMs) play an important role in modeling and understanding complex relationships in modern applied statistics. They allow for flexible, data-driven estimation of covariate effects. Yet researchers often have a priori knowledge of certain effects, which might be monotonic or periodic (cyclic) or should fulfill boundary conditions. We propose a unified framework to incorporate these constraints for both univariate and bivariate effect estimates and for varying coefficients. As the framework is based on component-wise boosting methods, variables can be selected intrinsically, and effects can be estimated for a wide range of different distributional assumptions. Bootstrap confidence intervals for the effect estimates are derived to assess the models. We present three case studies from environmental sciences to illustrate the proposed seamless modeling framework. All discussed constrained effect estimates are implemented in the comprehensive R package mboost for model-based boosting.

[1]  Torsten Hothorn,et al.  Conditional transformation models , 2012, 1201.5786.

[2]  Gerhard Tutz,et al.  Variable Selection and Model Choice in Geoadditive Regression Models , 2009, Biometrics.

[3]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[4]  Simon N. Wood,et al.  Shape constrained additive models , 2015, Stat. Comput..

[5]  A. Harvey,et al.  Forecasting Hourly Electricity Demand Using Time-Varying Splines , 1993 .

[6]  Peter Bacchetti,et al.  Additive Isotonic Models , 1989 .

[7]  J. Goldsmith,et al.  AIR POLLUTION AND DAILY MORTALITY , 1961, The American journal of the medical sciences.

[8]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[9]  Pin T. Ng,et al.  A Frisch-Newton Algorithm for Sparse Quantile Regression , 2005 .

[10]  H. Dette,et al.  A simple nonparametric estimator of a strictly monotone regression function , 2006 .

[11]  Roger Koenker,et al.  Inequality constrained quantile regression , 2005 .

[12]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[13]  Daowen Zhang,et al.  Semiparametric Regression for Periodic Longitudinal Hormone Data from Multiple Menstrual Cycles , 2000, Biometrics.

[14]  James O. Ramsay,et al.  Penalized regression with model‐based penalties , 2000 .

[15]  Torsten Hothorn,et al.  Large-Scale Model-Based Assessment of Deer-Vehicle Collision Risk , 2012, PloS one.

[16]  G. Tutz,et al.  Generalized Additive Modeling with Implicit Variable Selection by Likelihood‐Based Boosting , 2006, Biometrics.

[17]  P. Eilers Unimodal smoothing , 2022 .

[18]  Pin T. Ng,et al.  COBS: qualitatively constrained smoothing via linear programming , 1999, Comput. Stat..

[19]  Xihong Lin,et al.  Two‐Stage Functional Mixed Models for Evaluating the Effect of Longitudinal Covariate Profiles on a Scalar Outcome , 2007, Biometrics.

[20]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[21]  Holger Dette,et al.  Strictly monotone and smooth nonparametric regression for two or more variables , 2005 .

[22]  Chong Gu Smoothing Spline Anova Models , 2002 .

[23]  L. Fahrmeir,et al.  PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE , 2004 .

[24]  H. D. Brunk Maximum Likelihood Estimates of Monotone Parameters , 1955 .

[25]  P. Saldiva,et al.  Air pollution and child mortality: a time-series study in São Paulo, Brazil. , 2001, Environmental health perspectives.

[26]  Benjamin Hofner,et al.  Model-based boosting in R: a hands-on tutorial using the R package mboost , 2012, Computational Statistics.

[27]  D. Goldfarb,et al.  Dual and primal-dual methods for solving strictly convex quadratic programs , 1982 .

[28]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[29]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[30]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[31]  P. Lewis,et al.  Nonlinear Modeling of Time Series Using Multivariate Adaptive Regression Splines (MARS) , 1991 .

[32]  M. Kenward,et al.  The Analysis of Longitudinal Data Using Mixed Model L‐Splines , 2006, Biometrics.

[33]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[34]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[35]  Siem Jan Koopman,et al.  The modeling and seasonal adjustment of weekly observations , 1997 .

[36]  Torsten Hothorn,et al.  A Framework for Unbiased Model Selection Based on Boosting , 2011 .

[37]  S. Wood,et al.  Generalized Additive Models: An Introduction with R , 2006 .

[38]  Xuming He,et al.  Monotone B-Spline Smoothing , 1998 .

[39]  Benjamin Hofner,et al.  Modelling Flow in Gas Transmission Networks Using Shape-Constrained Expectile Regression , 2021 .

[40]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[41]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[42]  G. Tutz,et al.  Generalized Smooth Monotonic Regression in Additive Modeling , 2007 .

[43]  Benjamin Hofner,et al.  gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework , 2014, 1407.1774.

[44]  Benjamin Hofner,et al.  Boosting in structured additive models , 2011 .

[45]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[46]  James E. Gentle,et al.  Matrix Algebra: Theory, Computations, and Applications in Statistics , 2007 .

[47]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[48]  G. Beliakov Shape preserving approximation using least squares splines , 2000, Analysis in Theory and Applications.

[49]  B. Marx,et al.  Modulation models for seasonal time series and incidence tables , 2008, Statistics in medicine.

[50]  J. Ramsay Monotone Regression Splines in Action , 1988 .

[51]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[52]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[53]  I. van Mechelen,et al.  Simple and multiple P-splines regression with shape constraints. , 2006, The British journal of mathematical and statistical psychology.

[54]  D. Dockery,et al.  Air pollution and mortality in elderly people: a time-series study in Sao Paulo, Brazil. , 1995, Archives of environmental health.

[55]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[56]  Stanley R. Johnson,et al.  Varying Coefficient Models , 1984 .

[57]  Benjamin Hofner,et al.  Controlling false discoveries in high-dimensional situations: boosting with stability selection , 2014, BMC Bioinformatics.

[58]  James O. Ramsay,et al.  Fitting Curves with Periodic and Nonperiodic Trends and Their Interactions with Intensive Longitudinal Data , 2006 .

[59]  Nicolai Meinshausen,et al.  LASSO Isotone for High-Dimensional Additive Isotonic Regression , 2010, 1006.2940.

[60]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[61]  Matthias Schmid,et al.  Boosted Beta Regression , 2013, PloS one.

[62]  Torsten Hothorn,et al.  Variable selection and model choice in structured survival models , 2013, Comput. Stat..

[63]  G. Tutz,et al.  Generalized monotonic regression based on B-splines with an application to air pollution data. , 2007, Biostatistics.

[64]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[65]  R H Jones,et al.  Time series with periodic structure. , 1967, Biometrika.

[66]  Pin T. Ng,et al.  A fast and efficient implementation of qualitatively constrained quantile smoothing splines , 2007 .

[67]  Jörg Müller,et al.  Monotonicity-constrained species distribution models. , 2011, Ecology.

[68]  Gerda Claeskens,et al.  Simultaneous Confidence Bands for Penalized Spline Estimators , 2009 .

[69]  Benjamin Hofner,et al.  Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting , 2012 .

[70]  S. Wood Low‐Rank Scale‐Invariant Tensor Product Smooths for Generalized Additive Mixed Models , 2006, Biometrics.

[71]  Paul H. C. Eilers,et al.  Splines, knots, and penalties , 2010 .

[72]  Marco Heurich,et al.  Activity patterns of European roe deer (Capreolus capreolus) are strongly influenced by individual behaviour , 2013, Folia Zoologica.

[73]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[74]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[75]  Donald Goldfarb,et al.  A numerically stable dual method for solving strictly convex quadratic programs , 1983, Math. Program..

[76]  Thomas Kneib,et al.  Geoadditive expectile regression , 2012, Comput. Stat. Data Anal..