Generalized additive models for location, scale and shape

Summary.  A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y, as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random‐effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.

[1]  L. Amoroso,et al.  Ricerche intorno alla curva dei redditi , 1925 .

[2]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[3]  P. R. Rider,et al.  Generalized cauchy distributions , 1957 .

[4]  G. C. Tiao,et al.  A Further Look at Robustness via Bayes's Theorem , 1962 .

[5]  E. Stacy A Generalization of the Gamma Distribution , 1962 .

[6]  G. Box An analysis of transformations (with discussion) , 1964 .

[7]  C. Reinsch Smoothing by spline functions , 1967 .

[8]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[9]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[10]  H. Akaike A new look at the statistical model identification , 1974 .

[11]  G. Wahba Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression , 1978 .

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[14]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[15]  Patricia L. Smith Splines as a Useful and Convenient Statistical Tool , 1979 .

[16]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[17]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[18]  Peter Green Linear models for field trials, smoothing and cross-validation , 1985 .

[19]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[20]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[21]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[22]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[23]  D. Cox,et al.  Parameter Orthogonality and Approximate Conditional Inference , 1987 .

[24]  M. Healy,et al.  Distribution-free estimation of age-related centiles. , 1988, Annals of human biology.

[25]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[26]  T. Cole Fitting Smoothed Centile Curves to Reference Data , 1988 .

[27]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[28]  P. Thall,et al.  Some covariance models for longitudinal count data with overdispersion. , 1990, Biometrics.

[29]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[30]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[31]  Scott L. Zeger,et al.  Generalized linear models with random e ects: a Gibbs sampling approach , 1991 .

[32]  James R. Rieck,et al.  A log-linear model for the Birnbaum-Saunders distribution , 1991 .

[33]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[34]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[35]  Robert Kohn,et al.  The Performance of Cross-Validation and Maximum Likelihood Estimators of Spline Smoothing Parameters , 1991 .

[36]  R. Schall Estimation in generalized linear models with random effects , 1991 .

[37]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[38]  Terry Speed,et al.  [That BLUP is a Good Thing: The Estimation of Random Effects]: Comment , 1991 .

[39]  Daniel B. Nelson CONDITIONAL HETEROSKEDASTICITY IN ASSET RETURNS: A NEW APPROACH , 1991 .

[40]  T J Cole,et al.  Smoothing reference centile curves: the LMS method and penalized likelihood. , 1992, Statistics in medicine.

[41]  Detecting break points in generalised linear models , 1992 .

[42]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[43]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[44]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[45]  R. D. Murphy,et al.  Iterative solution of nonlinear equations , 1994 .

[46]  J. Burridge,et al.  A note on nonregular likelihood functions in heteroscedastic regression models , 1994 .

[47]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[48]  B. Hansen Autoregressive Conditional Density Estimation , 1994 .

[49]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[50]  T. Cole,et al.  Do growth chart centiles need a face lift? , 1994, BMJ.

[51]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[52]  Asoke K. Nandi,et al.  An extension of the generalized Gaussian distribution to include asymmetry , 1995 .

[53]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[54]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[55]  N. Breslow,et al.  Bias correction in generalised linear mixed models with a single component of dispersion , 1995 .

[56]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[57]  Peter K. Dunn,et al.  Randomized Quantile Residuals , 1996 .

[58]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[59]  R. A. Rigby,et al.  A semi-parametric additive model for variance heterogeneity , 1996, Stat. Comput..

[60]  Mikis D. Stasinopoulos,et al.  Mean and Dispersion Additive Models , 1996 .

[61]  Marc Saez,et al.  Use of the Beta‐Binomial Distribution to Model the Effect of Policy Changes on Appropriateness of Hospital Stays , 1996 .

[62]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[63]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[64]  P. W. Lane Generalized Nonlinear Models , 1996 .

[65]  M. Steel,et al.  On Bayesian Modelling of Fat Tails and Skewness , 1998 .

[66]  N. Breslow,et al.  Bias Correction in Generalized Linear Mixed Models with Multiple Components of Dispersion , 1996 .

[67]  C. McCulloch Maximum Likelihood Algorithms for Generalized Linear Mixed Models , 1997 .

[68]  R. Tibshirani,et al.  Bayesian Backfitting , 1998 .

[69]  Ronald M. Krauss,et al.  American Heart Association Call to Action: Obesity as a Major Risk Factor for Coronary Heart Disease , 1998 .

[70]  J. Hodges Some algebra and geometry for hierarchical models, applied to diagnostics , 1998 .

[71]  T J Cole,et al.  British 1990 growth reference centiles for weight, height, body mass index and head circumference fitted by maximum penalized likelihood. , 1998, Statistics in medicine.

[72]  B. Everitt,et al.  Analysis of longitudinal data , 1998, British Journal of Psychiatry.

[73]  Steven G. Gilmour,et al.  The analysis of designed experiments and longitudinal data by using smoothing splines - Discussion , 1999 .

[74]  M. Aitkin A General Maximum Likelihood Analysis of Variance Components in Generalized Linear Models , 1999, Biometrics.

[75]  D. Weakliem A Critique of the Bayesian Information Criterion for Model Selection , 1999 .

[76]  M. Kenward,et al.  The Analysis of Designed Experiments and Longitudinal Data by Using Smoothing Splines , 1999 .

[77]  X. Lin,et al.  Inference in generalized additive mixed modelsby using smoothing splines , 1999 .

[78]  A. Raftery Bayes Factors and BIC , 1999 .

[79]  T. Cole,et al.  Centiles of body mass index for Dutch children aged 0-20 years in 1980--a baseline to assess recent trends in obesity. , 1999, Annals of human biology.

[80]  J. Besag,et al.  Bayesian analysis of agricultural field experiments , 1999 .

[81]  Patrick J. Heagerty,et al.  Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in US children , 1999 .

[82]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[83]  P Royston,et al.  Goodness-of-fit statistics for age-specific reference intervals. , 2000, Statistics in medicine.

[84]  John A. Nelder,et al.  Two ways of modelling overdispersion in non‐normal data , 2000 .

[85]  J M Wit,et al.  Body index measurements in 1996–7 compared with 1980 , 2000, Archives of disease in childhood.

[86]  L. Fahrmeir,et al.  Modelling Rental Guide Data Using Mean and Dispersion Additive Models , 2000 .

[87]  S. Wood Modelling and smoothing parameter estimation with multiple quadratic penalties , 2000 .

[88]  Y. Al-Mazrou,et al.  Standardized national growth chart of 0-5-year-old Saudi children. , 2000, Journal of tropical pediatrics.

[89]  J. Hodges,et al.  Counting degrees of freedom in hierarchical and other richly-parameterised models , 2001 .

[90]  J. Nelder,et al.  Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions , 2001 .

[91]  L. Fahrmeir,et al.  Bayesian inference for generalized additive mixed models based on Markov random field priors , 2001 .

[92]  Youngjo Lee,et al.  Modelling and analysing correlated non-normal data , 2001 .

[93]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[94]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[95]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[96]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[97]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[98]  Ian T. Jolliffe,et al.  Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome , 2003, Comput. Stat. Data Anal..

[99]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[100]  Nicholas T. Longford,et al.  An alternative to model selection in ordinary regression , 2003, Stat. Comput..

[101]  M. C. Jones,et al.  A skew extension of the t‐distribution, with applications , 2003 .

[102]  R. Rigby,et al.  Generalized Autoregressive Moving Average Models , 2003 .

[103]  N. Hjort,et al.  The Focused Information Criterion , 2003 .

[104]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[105]  H. Pan,et al.  A comparison of goodness of fit tests for age‐related reference ranges , 2004, Statistics in medicine.

[106]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[107]  Estimating intraclass correlation for binary data using extended quasi-likelihood , 2004 .

[108]  R. Rigby,et al.  Smooth centile curves for skew and kurtotic data modelled using the Box–Cox power exponential distribution , 2004, Statistics in medicine.

[109]  Adrian Bowman,et al.  Generalized additive models for location, scale and shape - Discussion , 2005 .

[110]  E Borghi,et al.  Construction of the World Health Organization child growth standards: selection of methods for attained growth curves , 2006, Statistics in medicine.

[111]  J. Nelder,et al.  Double hierarchical generalized linear models , 2006 .

[112]  J. Nelder,et al.  Double hierarchical generalized linear models (with discussion) , 2006 .

[113]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[114]  Maengseok Noh,et al.  REML estimation for binary data in GLMMs , 2007 .