Simultaneous Safe Feature and Sample Elimination for Sparse Support Vector Regression

Sparse support vector regression (SSVR) is an effective regression technique. It has been successfully applied to many practical problems. However, it remains challenging to handle the large-scale problems. A nice property of SSVR is double sparsity in the sense that most irrelevant features and samples have no effect on the regressor. Inspired by it, we propose a simultaneous safe feature and sample screening rule based on the strong convexity of objective function (FSSR1) to accelerate SSVR, including the linear and nonlinear cases. Most inactive features and samples can be simultaneously discarded before training SSVR. Only one reduced SSVR (RSSVR) needs to be solved. Moreover, to further speed up RSSVR, the screening rule based on duality gap (FSSR2) is applied to continuously discard the remaining inactive variables during the reduced model training process. Combining the rule with the grid-search method, the framework of our final method (FSSR–SSVR) is to alternatively conduct FSSR1 and FSSR2, which leads to substantial savings in both the memory usage and the computational cost. There are two appealing advantages of our FSSR-SSVR: first, it is safe, i.e., the features and samples discarded by FSSR are guaranteed to be inactive; second, it has synergy effect (in other words, the results of the previous feature screening can improve the performance of the next sample screening, and vice versa). Furthermore, the stochastic dual coordinate ascent (SDCA) method is employed as an efficient solver. Experiments on three synthetic datasets and 11 real-world datasets demonstrate the efficiency of our method.

[1]  Patrick L. Combettes,et al.  Classification and Regression Using an Outer Approximation Projection-Gradient Method , 2015, IEEE Transactions on Signal Processing.

[2]  Yaonan Wang,et al.  A Three-Domain Fuzzy Support Vector Regression for Image Denoising and Experimental Studies , 2014, IEEE Transactions on Cybernetics.

[3]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[4]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[5]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[6]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[7]  Ichiro Takeuchi,et al.  Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling , 2016, ICML.

[8]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[9]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[10]  Ichiro Takeuchi,et al.  Safe Screening of Non-Support Vectors in Pathwise SVM Computation , 2013, ICML.

[11]  Marimuthu Palaniswami,et al.  A Division Algebraic Framework for Multidimensional Support Vector Regression , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[13]  Peter J. Ramadge,et al.  Fast lasso screening tests based on correlations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  William A. Sethares,et al.  Convergence Analysis of Two Loss Functions in Soft-Max Regression , 2016, IEEE Transactions on Signal Processing.

[15]  Xianli Pan,et al.  Safe Screening Rules for Accelerating Twin Support Vector Machine Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[17]  Yitian Xu,et al.  Scaling up twin support vector regression with safe screening rule , 2018, Inf. Sci..

[18]  Lorenzo Bruzzone,et al.  A multiple criteria active learning method for support vector regression , 2014, Pattern Recognit..

[19]  Sergios Theodoridis,et al.  A geometric approach to Support Vector Machine (SVM) classification , 2006, IEEE Transactions on Neural Networks.

[20]  Jieping Ye,et al.  Scaling SVM and Least Absolute Deviations via Exact Data Reduction , 2013, ICML.

[21]  Xianli Pan,et al.  A safe screening based framework for support vector regression , 2018, Neurocomputing.

[22]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[23]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[24]  Shie-Jue Lee,et al.  Dimensionality reduction by feature clustering for regression problems , 2015, Inf. Sci..

[25]  Koh Takeuchi,et al.  Towards Automatic Image Understanding and Mining via Social Curation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Senlin Luo,et al.  Self-paced Mixture of Regressions , 2017, IJCAI.

[27]  Jiayu Zhou,et al.  A Safe Screening Rule for Sparse Logistic Regression , 2013, NIPS.

[28]  R. C. Williamson,et al.  Support vector regression with automatic accuracy control. , 1998 .

[29]  Xizhao Wang,et al.  Advances in neural network based learning , 2014, Int. J. Mach. Learn. Cybern..

[30]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[31]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Yitian Xu,et al.  A safe screening rule for Laplacian support vector machine , 2018, Eng. Appl. Artif. Intell..

[33]  Jieping Ye,et al.  Scaling Up Sparse Support Vector Machine by Simultaneous Feature and Sample Reduction , 2016, ICML.

[34]  Xizhao Wang,et al.  Learning from big data with uncertainty - editorial , 2015, J. Intell. Fuzzy Syst..

[35]  Rémi Gribonval,et al.  Dynamic Screening: Accelerating First-Order Algorithms for the Lasso and Group-Lasso , 2014, IEEE Transactions on Signal Processing.

[36]  Alexandre Gramfort,et al.  GAP Safe Screening Rules for Sparse-Group Lasso , 2016, NIPS.

[37]  José R. Dorronsoro,et al.  Clipping algorithms for solving the nearest point problem over reduced convex hulls , 2011, Pattern Recognit..

[38]  Divyanshu Vats,et al.  High-Dimensional Screening Using Multiple Grouping of Variables , 2012, IEEE Transactions on Signal Processing.

[39]  Xizhao Wang,et al.  Segment Based Decision Tree Induction With Continuous Valued Attributes , 2015, IEEE Transactions on Cybernetics.

[40]  Nagiza F. Samatova,et al.  An SVM-based algorithm for identification of photosynthesis-specific genome features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[41]  Kongqiao Wang,et al.  Distributed Object Detection With Linear SVMs , 2014, IEEE Transactions on Cybernetics.

[42]  彭新俊 A clipping dual coordinate descent algorithm for solving support vector machines , 2014 .