Model-free Variable Selection in Reproducing Kernel Hilbert Space

Variable selection is popular in high-dimensional data analysis to identify the truly informative variables. Many variable selection methods have been developed under various model assumptions. Whereas success has been widely reported in literature, their performances largely depend on validity of the assumed models, such as the linear or additive models. This article introduces a model-free variable selection method via learning the gradient functions. The idea is based on the equivalence between whether a variable is informative and whether its corresponding gradient function is substantially non-zero. The proposed variable selection method is then formulated in a framework of learning gradients in a flexible reproducing kernel Hilbert space. The key advantage of the proposed method is that it requires no explicit model assumption and allows for general variable effects. Its asymptotic estimation and selection consistencies are studied, which establish the convergence rate of the estimated sparse gradients and assure that the truly informative variables are correctly identified in probability. The effectiveness of the proposed method is also supported by a variety of simulated examples and two real-life examples.

[1]  T. Gasser,et al.  On robust kernel estimation of derivatives of regression functions , 1985 .

[2]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[3]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[4]  J. Friedman Multivariate adaptive regression splines , 1990 .

[5]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Thomas S. Shively,et al.  Variable Selection and Function Estimation in Additive Nonparametric Regression Using a Data-Based Prior , 1999 .

[8]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[9]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[10]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[11]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[12]  Svante Janson,et al.  Large deviations for sums of partly dependent random variables , 2004, Random Struct. Algorithms.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  David Ruppert,et al.  Estimating the Interest Rate Term Structure of Corporate Debt With a Semiparametric Penalized Spline Model , 2004 .

[15]  Jianhua Z. Huang,et al.  Identification of non‐linear additive autoregressive models , 2004 .

[16]  C. Nachtsheim,et al.  Model‐free variable selection , 2005 .

[17]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  Sayan Mukherjee,et al.  Learning Coordinate Covariances via Gradients , 2006, J. Mach. Learn. Res..

[20]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[21]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[22]  Joel L. Horowitz,et al.  Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions , 2007, 0803.2999.

[23]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[24]  Howard D. Bondell,et al.  Shrinkage inverse regression estimation for model‐free variable selection , 2009 .

[25]  L. Xue CONSISTENT VARIABLE SELECTION IN ADDITIVE MODELS , 2009 .

[26]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[27]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[28]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[29]  Fred J. Hickernell,et al.  On Dimension-independent Rates of Convergence for Function Approximation with Gaussian Kernels , 2012, SIAM J. Numer. Anal..

[30]  Xiaohui Xie,et al.  Learning sparse gradients for variable selection and dimension reduction , 2010, Machine Learning.

[31]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[32]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[33]  Bart De Moor,et al.  Derivative estimation with local polynomial fitting , 2013, J. Mach. Learn. Res..

[34]  Wei Sun,et al.  Consistent selection of tuning parameters via variable selection stability , 2012, J. Mach. Learn. Res..

[35]  Ning Hao,et al.  Interaction Screening for Ultrahigh-Dimensional Data , 2014, Journal of the American Statistical Association.

[36]  L A Stefanski,et al.  Variable Selection in Nonparametric Classification Via Measurement Error Model Selection Likelihoods , 2014, Journal of the American Statistical Association.

[37]  Yi Yang,et al.  A fast unified algorithm for solving group-lasso penalize learning problems , 2014, Statistics and Computing.