LINEX Support Vector Machine for Large-Scale Classification

Traditional soft margin support vector machine usually uses hinge loss to build a classifier with the “maximum-margin” principle. However, C-SVM depends on support vectors causing the loss of data information. Then, least square support vector machine is proposed with square loss ( $l_{2}$ -loss). It establishes equality constraints instead of inequalities and considering all the instances. However, the square loss is still not the perfect one, since it gives equivalent punishment to the instances at both sides of the center plane. It does not match the reality considering the instances between two center planes deserve heavier penalty than the others. To this end, we propose a novel SVM method with the adoption of the asymmetry LINEX (linear-exponential) loss, which we called it LINEX-SVM. The LINEX loss gives different treatments to instances based on the importance of each point. It gives a heavier penalty to the points between two center planes while drawing light penalty to the points outside of the corresponding center planes. The comprehensive experiments have been implemented to validate the effectiveness of the LINEX-SVM.

[1]  Jinbo Bi,et al.  Support Vector Classification with Input Data Uncertainty , 2004, NIPS.

[2]  A. Zellner Bayesian Estimation and Prediction Using Asymmetric Loss Functions , 1986 .

[3]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Yong Shi,et al.  Ramp loss nonparallel support vector machine for pattern classification , 2015, Knowl. Based Syst..

[6]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[7]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[8]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[9]  William Stafiord Noble,et al.  Support vector machine applications in computational biology , 2004 .

[10]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[11]  V. Vapnik The Support Vector Method of Function Estimation , 1998 .

[12]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[13]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[14]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[15]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[16]  Ahmad Parsian,et al.  Estimation of the mean of the selected population under asymmetric loss function , 1999 .

[17]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[18]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[19]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[20]  Johan A. K. Suykens,et al.  Support Vector Machine Classifier With Pinball Loss , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Theodore B. Trafalis,et al.  Support vector machine for regression and applications to financial forecasting , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[24]  Edward Y. Chang,et al.  Using one-class and two-class SVMs for multiclass image annotation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Yong Shi,et al.  ν-Nonparallel support vector machine for pattern classification , 2014, Neural Computing and Applications.

[26]  Christian Janssen,et al.  Real estate price prediction under asymmetric loss , 1995 .

[27]  Kazuhiro Ohtani,et al.  Generalized ridge regression estimators under the LINEX loss function , 1995 .

[28]  Andreas Christmann,et al.  On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition , 2004, J. Mach. Learn. Res..

[29]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[30]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[31]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[36]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[37]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[38]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Dino Isa,et al.  Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine , 2008, IEEE Transactions on Knowledge and Data Engineering.

[40]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[41]  Guohua Zou Admissible estimation for finite population under the Linex loss function , 1997 .