Sparse Partially Linear Additive Models

The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing each to have either a linear or nonlinear effect on the response. However, the choice of which features to treat as linear or nonlinear is typically assumed known. Thus, to make a GPLAM a viable approach in situations in which little is known a priori about the features, one must overcome two primary model selection challenges: deciding which features to include in the model and determining which of these features to treat nonlinearly. We introduce the sparse partially linear additive model (SPLAM), which combines model fitting and both of these model selection challenges into a single convex optimization problem. SPLAM provides a bridge between the lasso and sparse additive models. Through a statistical oracle inequality and thorough simulation, we demonstrate that SPLAM can outperform other methods across a broad spectrum of statistical regimes, including the high-dimensional (p ≫ N) setting. We develop efficient algorithms that are applied to real datasets with half a million samples and over 45,000 features with excellent predictive performance. Supplementary materials for this article are available online.

[1]  A. Qu,et al.  Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates , 2014, 1405.6030.

[2]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[3]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[4]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[5]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[6]  Katya Scheinberg,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[7]  Shuangge Ma,et al.  Semiparametric Regression Pursuit. , 2012, Statistica Sinica.

[8]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[9]  Yufeng Liu,et al.  Linear or Nonlinear? Automatic Structure Discovery for Partially Linear Models , 2011, Journal of the American Statistical Association.

[10]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[11]  Jian Huang,et al.  SCAD-penalized regression in high-dimensional partially linear models , 2009, 0903.5474.

[12]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[13]  Hua Liang,et al.  GENERALIZED ADDITIVE PARTIAL LINEAR MODELS WITH HIGH-DIMENSIONAL COVARIATES , 2013, Econometric Theory.

[14]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[15]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[16]  Gregg E. Dinse,et al.  Regression Analysis of Tumour Prevalence Data , 1983 .

[17]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[18]  Clive W. J. Granger,et al.  Semiparametric estimates of the relation between weather and electricity sales , 1986 .

[19]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[21]  Snigdhansu Chatterjee,et al.  Sparse Group Lasso: Consistency and Climate Applications , 2012, SDM.

[22]  Florentina Bunea Consistent covariate selection and post model selection inference in semiparametric regression , 2004 .

[23]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[25]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[26]  Wolfgang Härdle,et al.  Partially Linear Models , 2000 .

[27]  Johannes Gehrke,et al.  MatchMiner: Efficient Spanning Structure Mining in Large Image Collections , 2012, ECCV.

[28]  Ashley Petersen,et al.  Fused Lasso Additive Model , 2014, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[29]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[30]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[31]  Xin Chen,et al.  Identification of Partially Linear Structure in Additive Models with an Application to Gene Expression Prediction from Sequences , 2012, Biometrics.

[32]  T. Hastie,et al.  Generalized Additive Model Selection , 2015, 1506.03850.

[33]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[34]  Sara van de Geer,et al.  The Partial Linear Model in High Dimensions , 2013, 1307.1067.

[35]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[38]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[39]  Mohamed Hebiri,et al.  How Correlations Influence Lasso Prediction , 2012, IEEE Transactions on Information Theory.

[40]  Hua Liang,et al.  Determination of linear components in additive models , 2011 .

[41]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[42]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[43]  Guang Cheng,et al.  Semiparametric regression models with additive nonparametric components and high dimensional parametric components , 2012, Comput. Stat. Data Anal..

[44]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[45]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[46]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .