Bi-sparse optimization-based least squares regression

Abstract For forecasting by regression, more and more instances and features are collected and added to the regression models. When there are many noisy and redundant instances and features, these models often give the poor predictive accuracy and interpretability owing to overfitting and computational complexity. Besides, least squares support vector regression (LSSVR) can hardly obtain sparse solutions and identify important instances and features from data. In this paper, a novel bi-sparse optimization-based least squares regression (BSOLSR) method is proposed in the framework of LSSVR. Based on the new row and column kernel matrices, the l 0 -norm sparsification function is introduced to the LSSVR model. By alternatively solving two unconstrained quadratic programming problems or two systems of linear equations, BSOLSR can predict output values for given input points and provide interpretable results by simultaneous selecting relevant and important instances and features. As shown in the experimental results on real data sets and comparison with SVR, l 1 -norm SVR (L1SVR), LSSVR, and multiple kernel learning SVR (MKLSVR), the proposed BSOLSR can effectively increase predictive accuracy, discover representative instances and important features, and gain the interpretable results, which are very critical for many real-world applications.

[1]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[2]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[3]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[4]  Nai-Yang Deng,et al.  Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions , 2012 .

[5]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[6]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Li Zhang,et al.  Linear programming support vector machines , 2002, Pattern Recognit..

[8]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[9]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[10]  Zhiwang Zhang,et al.  Two-phase multi-kernel LP-SVR for feature sparsification and forecasting , 2016, Neurocomputing.

[11]  Zhigang Chen,et al.  Sparse Support Vector Machine with Lp Penalty for Feature Selection , 2017, Journal of Computer Science and Technology.

[12]  Sabine Van Huffel,et al.  Feature Selection in Survival Least Squares Support Vector Machines with Maximal Variation Constraints , 2009, IWANN.

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  Petros Drineas,et al.  Feature selection for linear SVM with provable guarantees , 2014, Pattern Recognit..

[15]  D. Basak,et al.  Support Vector Regression , 2008 .

[16]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[17]  Yoshinobu Hotta,et al.  Sparse learning for support vector classification , 2010, Pattern Recognit. Lett..

[18]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[19]  Jianguo Sun,et al.  Robust support vector regression in the primal , 2008, Neural Networks.

[20]  Cory J. Butz,et al.  Rough support vector regression , 2010, Eur. J. Oper. Res..

[21]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[22]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[23]  Jim Jing-Yan Wang,et al.  Feature selection and multi-kernel learning for sparse representation on a manifold , 2014, Neural Networks.

[24]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[25]  Johan A. K. Suykens,et al.  Primal-Dual Framework for Feature Selection using Least Squares Support Vector Machines , 2013, COMAD.

[26]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[27]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[28]  Zhiwang Zhang,et al.  Multi-kernel multi-criteria optimization classifier with fuzzification and penalty factors for predicting biological activity , 2015, Knowl. Based Syst..

[29]  Peyman Adibi,et al.  Two-stage multiple kernel learning for supervised dimensionality reduction , 2015, Pattern Recognit..

[30]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[31]  Brian M. Steele,et al.  Algorithms for Data Science , 2016, Springer International Publishing.

[32]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[33]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[34]  Johan A. K. Suykens,et al.  Very Sparse LSSVM Reductions for Large-Scale Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Johan A. K. Suykens,et al.  Sparse Reductions for Fixed-Size Least Squares Support Vector Machines on Large Scale Data , 2013, PAKDD.

[36]  Yong Shi,et al.  Sparse feature kernel multi-criteria linear programming classifier , 2018, Neurocomputing.

[37]  Josiane Mothe,et al.  Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[39]  K. Thangavel,et al.  Dimensionality reduction based on rough set theory: A review , 2009, Appl. Soft Comput..

[40]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[41]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[42]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[43]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[44]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[45]  Julio López,et al.  An embedded feature selection approach for support vector classification via second-order cone programming , 2015, Intell. Data Anal..

[46]  W. Mendenhall,et al.  A Second Course in Statistics: Regression Analysis , 1996 .

[47]  G. Barreto,et al.  Novel sparse LSSVR models in primal weight space for robust system identification with outliers , 2017, Journal of Process Control.

[48]  Licheng Jiao,et al.  Fast Sparse Approximation for Least Squares Support Vector Machine , 2007, IEEE Transactions on Neural Networks.

[49]  Zbigniew Telec,et al.  Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms , 2012, Int. J. Appl. Math. Comput. Sci..

[50]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[51]  Yong Shi,et al.  A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis , 2009, Expert Syst. Appl..

[52]  Jianguo Sun,et al.  Rough nu-support vector regression , 2009, Expert Syst. Appl..

[53]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[54]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[55]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[56]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[57]  Alessandro Sperduti,et al.  Support Vector Regression with a Generalized Quadratic Loss , 2004, WIRN.

[58]  Panos M. Pardalos,et al.  Sparse Proximal Support Vector Machines for feature selection in high dimensional datasets , 2015, Expert Syst. Appl..

[59]  Chuanfa Chen,et al.  A robust weighted least squares support vector regression based on least trimmed squares , 2015, Neurocomputing.

[60]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[61]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[62]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[63]  Verónica Bolón-Canedo,et al.  Feature selection for high-dimensional data , 2016, Progress in Artificial Intelligence.

[64]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[65]  Wei Chu,et al.  Bayesian support vector regression using a unified loss function , 2004, IEEE Transactions on Neural Networks.