论文信息 - A Proximal Approach for Sparse Multiclass SVM

A Proximal Approach for Sparse Multiclass SVM

Sparsity-inducing penalties are useful tools to design multiclass support vector machines (SVMs). In this paper, we propose a convex optimization approach for efficiently and exactly solving the multiclass SVM learning problem involving a sparse regularization and the multiclass hinge loss formulated by Crammer and Singer. We provide two algorithms: the first one dealing with the hinge loss as a penalty term, and the other one addressing the case when the hinge loss is enforced through a constraint. The related convex optimization problems can be efficiently solved thanks to the flexibility offered by recent primal-dual proximal algorithms and epigraphical splitting techniques. Experiments carried out on several datasets demonstrate the interest of considering the exact expression of the hinge loss rather than a smooth approximation. The efficiency of the proposed algorithms w.r.t. several state-of-the-art methods is also assessed through comparisons of execution times.

[1] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[2] M. Aizerman,et al. Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[4] Xiaotong Shen,et al. On L1-Norm Multiclass Support Vector Machines , 2007 .

[5] Laurent Condat. Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.

[6] Yann LeCun,et al. Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[8] Lifeng Wang,et al. On L_1-Norm Multi-class Support Vector Machines , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[9] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10] Laura Schweitzer,et al. Advances In Kernel Methods Support Vector Learning , 2016 .

[11] Ji Zhu,et al. Variable selection for multicategory SVM via sup-norm regularization , 2006 .

[12] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13] Josiane Mothe,et al. Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14] Bang Công Vu,et al. A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[15] H. Zou,et al. The doubly regularized support vector machine , 2006 .

[16] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[17] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[18] Yufeng Liu,et al. Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[19] Lorenzo Rosasco,et al. Proximal methods for the latent group lasso penalty , 2012, Computational Optimization and Applications.

[20] Laurent Condat,et al. A Fast Projection onto the Simplex and the l 1 Ball , 2015 .

[21] Ben Taskar,et al. Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[22] Stéphane Canu,et al. $\ell_{p}-\ell_{q}$ Penalty for Sparse Linear and Sparse Multiple Kernel Multitask Learning , 2011, IEEE Transactions on Neural Networks.

[23] Julien Mairal,et al. Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[24] Lawrence Carin,et al. Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .

[26] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[27] Laurent Condat,et al. A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[28] Lorenzo Rosasco,et al. Nonparametric sparsity and regularization , 2012, J. Mach. Learn. Res..

[29] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[30] Giovanni Chierchia,et al. Parallel implementations of a disparity estimation algorithm based on a Proximal splitting method , 2012, 2012 Visual Communications and Image Processing.

[31] Alain Rakotomamonjy,et al. Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[32] Yoram Singer,et al. Boosting with structural sparsity , 2009, ICML '09.

[33] Stéphane Mallat,et al. Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[34] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .

[35] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[36] Michael P. Friedlander,et al. Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[37] Yufeng Liu,et al. Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[38] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[39] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[41] Nelly Pustelnik,et al. Epigraphical Projection and Proximal Tools for Solving Constrained Convex Optimization Problems: Part I , 2012, ArXiv.

[42] Carmen Peláez-Moreno,et al. A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation , 2005, NOLISP.

[43] Kazuhiro Seki,et al. Block coordinate descent algorithms for large-scale sparse multiclass classification , 2013, Machine Learning.

[44] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[45] Patrick L. Combettes,et al. A forward-backward view of some primal-dual optimization methods in image recovery , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[46] Naonori Ueda,et al. Large-Scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex , 2014, 2014 22nd International Conference on Pattern Recognition.

[47] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[48] P. L. Combettes,et al. Primal-Dual Splitting Algorithm for Solving Inclusions with Mixtures of Composite, Lipschitzian, and Parallel-Sum Type Monotone Operators , 2011, Set-Valued and Variational Analysis.

[49] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[50] Nelly Pustelnik,et al. Epigraphical proximal projection for sparse multiclass SVM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51] P. L. Combettes,et al. Variable metric forward–backward splitting with applications to monotone inclusions in duality , 2012, 1206.6791.

[52] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[53] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[55] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[56] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[57] Stephen P. Boyd,et al. Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[58] Ivor W. Tsang,et al. Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[59] H. Zou,et al. The F ∞ -norm support vector machine , 2008 .

[60] Chih-Jen Lin,et al. A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[61] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..