Non-integer norm regularization SVM via Legendre-Fenchel duality

Support vector machine is an effective classification and regression method that uses VC theory of large margin to maximize the predictive accuracy while avoiding over-fitting of data. L2-norm regularization has been commonly used. If the training data set contains many noise features, L1-norm regularization SVM will provide a better performance. However, both L1-norm and L2-norm are not the optimal regularization method when handling a large number of redundant features and only a small amount of data points are useful for machine learning. We have therefore proposed an adaptive learning algorithm using the p-norm regularization SVM for 0

[1]  Ata Kabán,et al.  Learning with Lq<1 vs L1-Norm Regularisation with Exponentially Many Irrelevant Features , 2008, ECML/PKDD.

[2]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[3]  Bernhard Schölkopf,et al.  Entropy Numbers of Linear Function Classes , 2000, COLT.

[4]  Zhiqiang Zhang,et al.  Adaptive feature selection via a new version of support vector machine , 2012, Neural Computing and Applications.

[5]  S. Foucart,et al.  Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0 , 2009 .

[6]  R. Leahy,et al.  On the design of maximally sparse beamforming arrays , 1991 .

[7]  Cho-Jui Hsieh,et al.  Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .

[8]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[9]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[10]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[13]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[14]  Trevor Hastie,et al.  Support Vector Machines , 2013 .

[15]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[16]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[17]  D. Pollard Convergence of stochastic processes , 1984 .

[18]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[19]  Alan J. Miller Subset Selection in Regression , 1992 .

[20]  Liu Jian,et al.  Classification Algorithm of Support Vector Machine via p-norm Regularization , 2012 .

[21]  Alexander G. Gray,et al.  Sparse high-dimensional fractional-norm support vector machine via DC programming , 2013, Comput. Stat. Data Anal..

[22]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[23]  Yuan-Hai Shao,et al.  Mixed-norm linear support vector machine , 2012, Neural Computing and Applications.

[24]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[25]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[26]  Xionglin Luo,et al.  Iterative Reweighted Noninteger Norm Regularizing SVM for Gene Expression Data Classification , 2013, Comput. Math. Methods Medicine.

[27]  Yufeng Liu,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[28]  Mirta B. Gordon,et al.  Learning properties of Support Vector Machines , 1998 .

[29]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[30]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  R. Chartrand,et al.  Restricted isometry properties and nonconvex compressive sensing , 2007 .

[32]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[33]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[34]  Xiong-Lin Luo,et al.  Classification Algorithm of Support Vector Machine via p -norm Regularization: Classification Algorithm of Support Vector Machine via p -norm Regularization , 2012 .

[35]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[36]  Vladimir Vapnik Estimations of dependences based on statistical data , 1982 .

[37]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[38]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[39]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[40]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[41]  R. Gribonval,et al.  Highly sparse representations from dictionaries are unique and independent of the sparseness measure , 2007 .