Sparse Modal Additive Model

Sparse additive models have been successfully applied to high-dimensional data analysis due to the flexibility and interpretability of their representation. However, the existing methods are often formulated using the least-squares loss with learning the conditional mean, which is sensitive to data with the non-Gaussian noises, e.g., skewed noise, heavy-tailed noise, and outliers. To tackle this problem, we propose a new robust regression method, called as sparse modal additive model (SpMAM), by integrating the modal regression metric, the data-dependent hypothesis space, and the weighted ℓq,1-norm regularizer (q ≥ 1) into the additive models. Specifically, the modal regression metric assures the model robustness to complex noises via learning the conditional mode, the data-dependent hypothesis space offers the model adaptivity via sample-based presentation, and the ℓq,1-norm regularizer addresses the algorithmic interpretability via sparse variable selection. In theory, the proposed SpMAM enjoys statistical guarantees on asymptotic consistency for regression estimation and variable selection simultaneously. Experimental results on both synthetic and real-world benchmark data sets validate the effectiveness and robustness of the proposed model.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  William Stafford Noble,et al.  Support vector machine , 2013 .

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Ding-Xuan Zhou,et al.  Concentration estimates for learning with ℓ1-regularizer and data dependent hypothesis spaces , 2011 .

[5]  Ran He,et al.  Maximum Correntropy Criterion for Robust Face Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Xi Chen,et al.  Group Sparse Additive Models , 2012, ICML.

[7]  G. Tutz,et al.  Modelling beyond regression functions: an application of multimodal regression to speed–flow data , 2006 .

[8]  Jun Fan,et al.  A Statistical Learning Approach to Modal Regression , 2017, J. Mach. Learn. Res..

[9]  Yuan Yan Tang,et al.  Correntropy Matching Pursuit With Application to Robust Digit and Face Recognition , 2017, IEEE Transactions on Cybernetics.

[10]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[11]  Ding-Xuan Zhou,et al.  Learning rates for the risk of kernel-based quantile regression estimators in additive models , 2014, 1405.3379.

[12]  C. Heinrich,et al.  The mode functional is not elicitable , 2014 .

[13]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[14]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[15]  Hong Chen,et al.  Group Sparse Additive Machine , 2017, NIPS.

[16]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[17]  Ding-Xuan Zhou,et al.  Concentration estimates for learning with unbounded sampling , 2013, Adv. Comput. Math..

[18]  W. Yao,et al.  A New Regression Model: Modal Linear Regression , 2014 .

[19]  Gérard Biau,et al.  Simple estimation of the mode of a multivariate density , 2003 .

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  L. Wasserman,et al.  Nonparametric modal regression , 2014, 1412.1716.

[22]  Yaoliang Yu,et al.  Additive Approximations in High Dimensional Nonparametric Regression via the SALSA , 2016, ICML.

[23]  Wei Sun,et al.  Consistent selection of tuning parameters via variable selection stability , 2012, J. Mach. Learn. Res..

[24]  Adam Krzyżak,et al.  Nonparametric Regression Based on Hierarchical Interaction Models , 2017, IEEE Transactions on Information Theory.

[25]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[26]  H. Chernoff Estimation of the mode , 1964 .

[27]  Ming Yuan,et al.  Minimax Optimal Rates of Estimation in High Dimensional Additive Models: Universal Phase Transition , 2015, ArXiv.

[28]  G. Wahba Spline models for observational data , 1990 .

[29]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[30]  Lei Shi Learning theory estimates for coefficient-based regularized regression , 2013 .

[31]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[32]  Eric Matzner-Løber,et al.  Nonparametric forecasting: a comparison of three kernel-based methods , 1998 .

[33]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[34]  Hong Chen,et al.  Kernel-based sparse regression with the correntropy-induced loss , 2018 .

[35]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[36]  Lei Yang,et al.  Model-free Variable Selection in Reproducing Kernel Hilbert Space , 2016, J. Mach. Learn. Res..

[37]  Yuan Yan Tang,et al.  $k$ -Times Markov Sampling for SVMC , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[38]  T. Sager,et al.  Maximum Likelihood Estimation of Isotonic Modal Regression , 1982 .

[39]  Fred J. Hickernell,et al.  On Dimension-independent Rates of Convergence for Function Approximation with Gaussian Kernels , 2012, SIAM J. Numer. Anal..

[40]  Ding-Xuan Zhou,et al.  Learning with sample dependent hypothesis spaces , 2008, Comput. Math. Appl..

[41]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[42]  Yiming Ying,et al.  Multi-kernel regularized classifiers , 2007, J. Complex..

[43]  Jian Huang,et al.  Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space , 2018 .

[44]  Runze Li,et al.  Local modal regression , 2012, Journal of nonparametric statistics.

[45]  Yen-Chi Chen,et al.  A tutorial on kernel density estimation and recent advances , 2017, 1704.03924.

[46]  T. Sager Estimation of a Multivariate Mode , 1978 .

[47]  Tuo Zhao,et al.  Sparse Additive Machine , 2012, AISTATS.

[48]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[49]  Johan A. K. Suykens,et al.  Learning with the maximum correntropy criterion induced losses for regression , 2015, J. Mach. Learn. Res..

[50]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[51]  Yen-Chi Chen Modal regression using kernel density estimation: A review , 2017, 1710.07004.

[52]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[53]  T. Dalenius The Mode—A Neglected Statistical Parameter , 1965 .

[54]  Hong Chen,et al.  Error Analysis of Generalized Nyström Kernel Regression , 2016, NIPS.

[55]  Dinggang Shen,et al.  Regularized Modal Regression with Applications in Cognitive Impairment Prediction , 2017, NIPS.

[56]  W. Härdle,et al.  A note on prediction via estimation of the conditional mode function , 1986 .