Feature significance in generalized additive models

This paper develops inference for the significance of features such as peaks and valleys observed in additive modeling through an extension of the SiZer-type methodology of Chaudhuri and Marron (1999) and Godtliebsen et al. (2002, 2004) to the case where the outcome is discrete. We consider the problem of determining the significance of features such as peaks or valleys in observed covariate effects both for the case of additive modeling where the main predictor of interest is univariate as well as the problem of studying the significance of features such as peaks, inclines, ridges and valleys when the main predictor of interest is geographical location. We work with low rank radial spline smoothers to allow to the handling of sparse designs and large sample sizes. Reducing the problem to a Generalised Linear Mixed Model (GLMM) framework enables derivation of simulation-based critical value approximations and guards against the problem of multiple inferences over a range of predictor values. Such a reduction also allows for easy adjustment for confounders including those which have an unknown or complex effect on the outcome. A simulation study indicates that our method has satisfactory power. Finally, we illustrate our methodology on several data sets.

[1]  Robert Fildes,et al.  The practice of econometrics: Classical and contemporary: Ernst R. Berndt, (Addison-Wesley Publishing company, Reading, Mass., 1991), pp. 702, $18.95 , 1992 .

[2]  Probal Chaudhuri,et al.  Statistical significance of features in digital images , 2004, Image Vis. Comput..

[3]  M. Wand,et al.  Feature Significance in Geostatistics , 2004 .

[4]  Jean-Paul Chilbs,et al.  Geostatistics , 2000, Technometrics.

[5]  M P Wand,et al.  Generalized additive distributed lag models: quantifying mortality displacement. , 2000, Biostatistics.

[6]  J. Marron,et al.  SiZer for Exploration of Structures in Curves , 1999 .

[7]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[8]  R. Munn,et al.  The Design of Air Quality Monitoring Networks , 1981 .

[9]  Douglas W. Nychka,et al.  Design of Air-Quality Monitoring Networks , 1998 .

[10]  Probal Chaudhuri,et al.  Significance in Scale Space for Bivariate Density Estimation , 2002 .

[11]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[12]  Jianqing Fan,et al.  Local polynomial kernel regression for generalized linear models and quasi-likelihood functions , 1995 .

[13]  Ernst R. Berndt,et al.  The Practice of Econometrics: Classic and Contemporary. , 1992 .

[14]  J. Marron,et al.  SCALE SPACE VIEW OF CURVE ESTIMATION , 2000 .

[15]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[16]  R. Wolfinger,et al.  Generalized linear mixed models a pseudo-likelihood approach , 1993 .

[17]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .