Kernel Meets Sieve: Post-Regularization Confidence Bands for Sparse Additive Model

Abstract We develop a novel procedure for constructing confidence bands for components of a sparse additive model. Our procedure is based on a new kernel-sieve hybrid estimator that combines two most popular nonparametric estimation methods in the literature, the kernel regression and the spline method, and is of interest in its own right. Existing methods for fitting sparse additive model are primarily based on sieve estimators, while the literature on confidence bands for nonparametric models are primarily based upon kernel or local polynomial estimators. Our kernel-sieve hybrid estimator combines the best of both worlds and allows us to provide a simple procedure for constructing confidence bands in high-dimensional sparse additive models. We prove that the confidence bands are asymptotically honest by studying approximation with a Gaussian process. Thorough numerical results on both synthetic data and real-world neuroscience data are provided to demonstrate the efficacy of the theory. Supplementary materials for this article are available online.

[1]  Johannes Gehrke,et al.  Sparse Partially Linear Additive Models , 2014, ArXiv.

[2]  D. Kozbur INFERENCE IN ADDITIVELY SEPARABLE MODELS WITH A HIGH DIMENSIONAL COMPONENT , 2013 .

[3]  Jianqing Fan,et al.  Statistical Estimation in Varying-Coefficient Models , 1999 .

[4]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[5]  Han Liu,et al.  Post-Regularization Inference for Dynamic Nonparanormal Graphical Models , 2015 .

[6]  Martin Wahl Variable selection in high-dimensional additive models based on norms of projections , 2014, 1406.0052.

[7]  A. Qu,et al.  Consistent Model Selection for Marginal Generalized Additive Model for Correlated Data , 2010 .

[8]  Xiaotong Shen,et al.  Local asymptotics for regression splines and confidence regions , 1998 .

[9]  Nicolai Meinshausen,et al.  Group bound: confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design , 2013, 1309.3489.

[10]  S. Lahiri,et al.  Rates of convergence of the Adaptive LASSO estimators to the Oracle distribution and higher order refinements by the bootstrap , 2013, 1307.1952.

[11]  Hao Helen Zhang,et al.  COMPONENT SELECTION AND SMOOTHING FOR NONPARAMETRIC REGRESSION IN EXPONENTIAL FAMILIES , 2006 .

[12]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[13]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[14]  Arnab Maity,et al.  Nonparametric additive regression for repeatedly measured data , 2009 .

[15]  Paul Doukhan,et al.  WEAK DEPENDENCE: MODELS AND APPLICATIONS TO ECONOMETRICS , 2004, Econometric Theory.

[16]  X. Lin,et al.  Inference in generalized additive mixed modelsby using smoothing splines , 1999 .

[17]  Jianqing Fan,et al.  Nonparametric Inferences for Additive Models , 2005 .

[18]  Jianqing Fan,et al.  Simultaneous Confidence Bands and Hypothesis Testing in Varying‐coefficient Models , 2000 .

[19]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[20]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[21]  Kengo Kato,et al.  Robust inference in high-dimensional approximately sparse quantile regression models , 2013 .

[22]  Victor Chernozhukov,et al.  Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems , 2013, 1304.0282.

[23]  J. Larsson,et al.  “Maturational lag” hypothesis of attention deficit hyperactivity disorder: an update , 2003, Acta paediatrica.

[24]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[25]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[26]  B. Hansen UNIFORM CONVERGENCE RATES FOR KERNEL ESTIMATION WITH DEPENDENT DATA , 2008, Econometric Theory.

[27]  D. Louani Large Deviations Limit Theorems for the Kernel Density Estimator , 1998 .

[28]  M. Peligrad Properties of uniform consistency of the kernel estimators of density and regression functions under dependence assumptions , 1992 .

[29]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[30]  Kengo Kato,et al.  Uniform post selection inference for LAD regression and other z-estimation problems , 2013 .

[31]  Dennis L. Sun,et al.  Exact post-selection inference with the lasso , 2013 .

[32]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[33]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[34]  A. Belloni,et al.  Honest Confidence Regions for Logistic Regression with a Large Number of Controls , 2013 .

[35]  Robert Tibshirani,et al.  Post-selection adaptive inference for Least Angle Regression and the Lasso , 2014 .

[36]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[37]  Ashley Petersen,et al.  Fused Lasso Additive Model , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[38]  Robert Tibshirani,et al.  Generalized additive models for longitudinal data , 1998 .

[39]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[40]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[41]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2015 .

[42]  Gerda Claeskens,et al.  Bootstrap confidence bands for regression curves and their derivatives , 2003 .

[43]  Bin Yu,et al.  Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression , 2013, 1306.5505.

[44]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[45]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[46]  P. Hall On convergence rates of suprema , 1991 .

[47]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[48]  L. Wasserman All of Nonparametric Statistics , 2005 .

[49]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[50]  Katya Scheinberg,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[51]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[52]  Lorenzo Rosasco,et al.  Nonparametric sparsity and regularization , 2012, J. Mach. Learn. Res..

[53]  Adel Javanmard,et al.  Nearly optimal sample size in hypothesis testing for high-dimensional regression , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[54]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[55]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[56]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[57]  Gordon J Johnston,et al.  Probabilities of maximal deviations for nonparametric regression function estimates , 1982 .

[58]  Runze Li,et al.  MULTIVARIATE VARYING COEFFICIENT MODEL FOR FUNCTIONAL RESPONSES. , 2012, Annals of statistics.

[59]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[60]  Wolfgang Karl Härdle,et al.  Asymptotic maximal deviation of M-smoothers , 1989 .

[61]  James Demmel,et al.  The Componentwise Distance to the Nearest Singular Matrix , 1992, SIAM J. Matrix Anal. Appl..

[62]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2012, 1212.6885.

[63]  P. Bickel,et al.  On Some Global Measures of the Deviations of Density Function Estimates , 1973 .

[64]  Jianqing Fan Local Linear Regression Smoothers and Their Minimax Efficiencies , 1993 .

[65]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[66]  Min Xu,et al.  Faithful Variable Screening for High-Dimensional Convex Regression , 2014, 1411.1805.

[67]  R. Nickl,et al.  An exponential inequality for the distribution function of the kernel density estimator, with applications to adaptive estimation , 2009 .

[68]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[69]  Jian Huang,et al.  VARIABLE SELECTION AND ESTIMATION IN HIGH-DIMENSIONAL VARYING-COEFFICIENT MODELS. , 2011, Statistica Sinica.

[70]  Kengo Kato,et al.  Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models , 2013, Journal of the American Statistical Association.

[71]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[72]  Christian Hansen,et al.  Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach , 2014, 1501.03430.

[73]  A. Dalalyan,et al.  Tight conditions for consistency of variable selection in the context of high dimensionality , 2011, 1106.4293.

[74]  David Ruppert,et al.  Fitting a Bivariate Additive Model by Local Polynomial Regression , 1997 .

[75]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[76]  C. Loader,et al.  Simultaneous Confidence Bands for Linear Regression and Smoothing , 1994 .

[77]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[78]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[79]  Peter Bühlmann,et al.  p-Values for High-Dimensional Regression , 2008, 0811.2177.

[80]  Byeong U. Park,et al.  Time-Varying Additive Models for Longitudinal Data , 2013 .

[81]  Junwei Lu,et al.  Post-Regularization Inference for Time-Varying Nonparanormal Graphical Models , 2015, J. Mach. Learn. Res..

[82]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[83]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[84]  Haibo Zhou,et al.  Two-stage efficient estimation of longitudinal nonparametric additive models , 2007 .

[85]  C A Mann,et al.  Quantitative analysis of EEG in boys with attention-deficit-hyperactivity disorder: controlled study with clinical implications. , 1992, Pediatric neurology.

[86]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[87]  Wenyang Zhang,et al.  Simultaneous confidence band and hypothesis test in generalised varying-coefficient models , 2010, J. Multivar. Anal..

[88]  Victor Chernozhukov,et al.  Post-Selection Inference for Generalized Linear Models With Many Controls , 2013, 1304.3969.

[89]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[90]  Alan C. Evans,et al.  Attention-deficit/hyperactivity disorder is characterized by a delay in cortical maturation , 2007, Proceedings of the National Academy of Sciences.

[91]  Kengo Kato Two-step estimation of high dimensional additive models , 2012, 1207.5313.

[92]  Yu. I. Ingster,et al.  Statistical inference in compound functional models , 2012, 1208.6402.

[93]  P. Tseng,et al.  AMlet, RAMlet, and GAMlet: Automatic Nonlinear Fitting of Additive Models, Robust and Generalized, With Wavelets , 2004 .

[94]  Victor Chernozhukov,et al.  Anti-concentration and honest, adaptive confidence bands , 2013 .

[95]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[96]  Christophe Ambroise,et al.  Parsimonious additive models , 2007, Comput. Stat. Data Anal..

[97]  G. Lecu'e,et al.  Selection of variables and dimension reduction in high-dimensional non-parametric regression , 2008, 0811.1115.

[98]  Timothy O. Laumann,et al.  Functional Network Organization of the Human Brain , 2011, Neuron.

[99]  N. Wermuth,et al.  Nonlinear Time Series : Nonparametric and Parametric Methods , 2005 .

[100]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[101]  Elias Masry,et al.  MULTIVARIATE LOCAL POLYNOMIAL REGRESSION FOR TIME SERIES:UNIFORM STRONG CONSISTENCY AND RATES , 1996 .

[102]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[103]  Yun Yang,et al.  Minimax-optimal nonparametric regression in high dimensions , 2014, 1401.7278.

[104]  Jianqing Fan Design-adaptive Nonparametric Regression , 1992 .

[105]  E. Nadaraya On Estimating Regression , 1964 .

[106]  Miles E. Lopes,et al.  A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs , 2014, NIPS.

[107]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[108]  Christian Windischberger,et al.  Toward discovery science of human brain function , 2010, Proceedings of the National Academy of Sciences.

[109]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[110]  Jane-Ling Wang,et al.  Varying-coefficient additive models for functional data , 2015 .

[111]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[112]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[113]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .