Variable selection in functional additive regression models

This paper considers the problem of variable selection in regression models in the case of functional variables that may be mixed with other type of variables (scalar, multivariate, directional, etc.). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model based on the use of distance correlation proposed by Székely et al. (Ann Stat 35(6):2769–2794, 2007). For the sake of simplicity, this paper only uses additive models. However, the proposed algorithm may assess the type of contribution (linear, non linear, ...) of each variable. The algorithm has shown quite promising results when applied to simulations and real data sets.

[1]  C. Mallows More comments on C p , 1995 .

[2]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[3]  Wenceslao González-Manteiga,et al.  Generalized additive models for functional data , 2013 .

[4]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[6]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[7]  H. Akaike Maximum likelihood identification of Gaussian autoregressive moving average models , 1973 .

[8]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[9]  Maria L. Rizzo,et al.  Variable selection in regression using maximal correlation and distance correlation , 2015 .

[10]  L. Xue CONSISTENT VARIABLE SELECTION IN ADDITIVE MODELS , 2009 .

[11]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[12]  Hua Liang,et al.  Estimation and Variable Selection for Semiparametric Additive Partial Linear Models (SS-09-140). , 2011, Statistica Sinica.

[13]  Frédéric Ferraty,et al.  Additive prediction and boosting for functional data , 2009, Comput. Stat. Data Anal..

[14]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[15]  Gareth M. James,et al.  Improved variable selection with Forward-Lasso adaptive shrinkage , 2011, 1104.3390.

[16]  Hao Helen Zhang,et al.  Component selection and smoothing in smoothing spline analysis of variance models -- COSSO , 2003 .

[17]  Gábor J. Székely,et al.  The Energy of Data , 2017 .

[18]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[19]  Guang Cheng,et al.  Semiparametric regression models with additive nonparametric components and high dimensional parametric components , 2012, Comput. Stat. Data Anal..

[20]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[21]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[22]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[23]  Fang Yao,et al.  Functional Additive Models , 2008 .

[24]  Hua Liang,et al.  ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS. , 2011, Annals of statistics.

[25]  M. Stone Comments on Model Selection Criteria of Akaike and Schwarz , 1979 .

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .