Profiled Forward Regression for Ultrahigh Dimensional Variable Screening in Semiparametric Partially Linear Models

In partially linear model selection, we develop a profiled forward regression (PFR) algorithm for ultrahigh dimensional variable screening. The PFR algorithm effectively combines the ideas of nonparametric profiling and forward regression. This allows us to obtain a uniform bound for the absolute difference between the profiled predictors and their estimators. Based on this important finding, we are able to show that the PFR algorithm discovers all relevant variables within a few fairly short steps. Numerical studies are presented to illustrate the performance of the proposed method.

[1]  Nancy E. Heckman,et al.  Spline Smoothing in a Partly Linear Model , 1986 .

[2]  Florentina Bunea Consistent covariate selection and post model selection inference in semiparametric regression , 2004 .

[3]  B. Silverman,et al.  Weak and strong uniform consistency of kernel regression estimates , 1982 .

[4]  David L. Donoho,et al.  Sparse Solution Of Underdetermined Linear Equations By Stagewise Orthogonal Matching Pursuit , 2006 .

[5]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[6]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[7]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[8]  A. A. Weiss,et al.  Semiparametric estimates of the relation between weather and electricity sales , 1986 .

[9]  R. Richardson The International Congress of Mathematicians , 1932, Science.

[10]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[11]  G. Wahba Partial and interaction spline models for the semiparametric estimation of functions of several variables , 1986 .

[12]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[13]  Cun-Hui Zhang,et al.  Stepwise searching for feature variables in high-dimensional linear regression , 2008 .

[14]  Jian Huang,et al.  SCAD-penalized regression in high-dimensional partially linear models , 2009, 0903.5474.

[15]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[16]  Wolfgang Härdle,et al.  Partially Linear Models , 2000 .

[17]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[18]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[19]  Jianqing Fan,et al.  Efficient Estimation and Inferences for Varying-Coefficient Models , 2000 .

[20]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[21]  D. Pollard Convergence of stochastic processes , 1984 .

[22]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[23]  Victoria Stodden,et al.  Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[24]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[25]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[26]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[27]  Jianqing Fan,et al.  Generalized Partially Linear Single-Index Models , 1997 .

[28]  Jianqing Fan,et al.  Profile likelihood inferences on semiparametric varying-coefficient partially linear models , 2005 .

[29]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[30]  R. Tibshirani,et al.  "Preconditioning" for feature selection and regression in high-dimensional problems , 2007, math/0703858.

[31]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Yingcun Xia,et al.  ASYMPTOTIC DISTRIBUTIONS FOR TWO ESTIMATORS OF THE SINGLE-INDEX MODEL , 2006, Econometric Theory.