A review of spline function procedures in R

BackgroundWith progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R.MethodsIn this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions.ResultsWe present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results.ConclusionsThis work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance.

[1]  Paul H. C. Eilers,et al.  Splines, knots, and penalties , 2010 .

[2]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[3]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[4]  P. Grambsch,et al.  A Package for Survival Analysis in S , 1994 .

[5]  P. Royston,et al.  MFP: Multivariable Model‐Building with Fractional Polynomials , 2008 .

[6]  Resampling Methods for Model Fitting and Model Selection , 2011, Journal of biopharmaceutical statistics.

[7]  Yuedong Wang,et al.  Smoothing Splines: Methods and Applications , 2011 .

[8]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[9]  G. Heller,et al.  Flexible Regression and Smoothing: Using Gamlss in R , 2017 .

[10]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[13]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[14]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[15]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[16]  Anne-Laure Boulesteix,et al.  On stability issues in deriving multivariable regression models , 2015, Biometrical journal. Biometrische Zeitschrift.

[17]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[18]  M. Wand,et al.  Semiparametric Regression: Parametric Regression , 2003 .

[19]  Robert A. Muenchen,et al.  The Popularity of Data Analysis Software , 2013 .

[20]  W. Sauerbrei,et al.  STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative , 2014, Statistics in medicine.

[21]  Naomi S. Altman,et al.  Quantile regression , 2019, Nature Methods.

[22]  D. Cox Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[23]  S. Wood Thin plate regression splines , 2003 .

[24]  Simon N Wood,et al.  Just Another Gibbs Additive Modeler: Interfacing JAGS and mgcv , 2016, 1602.02539.

[25]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[26]  T J Cole,et al.  Smoothing reference centile curves: the LMS method and penalized likelihood. , 1992, Statistics in medicine.

[27]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[28]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[29]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[30]  ModelsThomas W. Yee Reduced-rank Vector Generalized Linear Models , 2000 .

[31]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[32]  P. Royston,et al.  Selection of important variables and determination of functional form for continuous predictors in multivariable model building , 2007, Statistics in medicine.

[33]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[34]  F. Dominici,et al.  On the use of generalized additive models in time-series studies of air pollution and health. , 2002, American journal of epidemiology.

[35]  D. Stasinopoulos,et al.  Discussion: A comparison of GAMLSS with quantile regression , 2013 .

[36]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[37]  Thomas Yee,et al.  VGAM: Vector Generalized Linear and Additive Models 1.0-4 , 2017 .

[38]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[39]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[40]  Paul H. C. Eilers,et al.  Penalized regression with individual deviance effects , 2010, Comput. Stat..