Improving the Generalization Performance of Multi-class SVM via Angular Regularization

In multi-class support vector machine (MSVM) for classification, one core issue is to regularize the coefficient vectors to reduce overfitting. Various regularizers have been proposed such as `2, `1, and trace norm. In this paper, we introduce a new type of regularization approach – angular regularization, that encourages the coefficient vectors to have larger angles such that class regions can be widen to flexibly accommodate unseen samples. We propose a novel angular regularizer based on the singular values of the coefficient matrix, where the uniformity of singular values reduces the correlation among different classes and drives the angles between coefficient vectors to increase. In generalization error analysis, we show that decreasing this regularizer effectively reduces generalization error bound. On various datasets, we demonstrate the efficacy of the regularizer in reducing overfitting.

[1]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[2]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[3]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[4]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[5]  Xiaojie Guo,et al.  Exclusivity Regularized Machine , 2016, ArXiv.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Pengtao Xie Learning Compact and Effective Distance Metrics with Diversity Regularization , 2015, ECML/PKDD.

[8]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[9]  Marius Kloft Lecture Notes on Statistical Learning Theory , 2013 .

[10]  Pengtao Xie,et al.  On the Generalization Error Bounds of Neural Networks under Diversity-Inducing Mutual Angular Regularization , 2015, ArXiv.

[11]  Bernt Schiele,et al.  Top-k Multiclass SVM , 2015, NIPS.

[12]  Yang Yu,et al.  Diversity Regularized Machine , 2011, IJCAI.

[13]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[14]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[15]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Emmanuel Monfrini,et al.  A Quadratic Loss Multi-Class SVM for which a Radius-Margin Bound Applies , 2011, Informatica.

[18]  Václav Hlavác,et al.  Multi-class support vector machine , 2002, Object recognition supported by user interaction for service robots.

[19]  Koby Crammer,et al.  Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification , 2011, KDD.

[20]  Volume 22 , 1998 .

[21]  Pengtao Xie,et al.  Diversifying Restricted Boltzmann Machine for Document Modeling , 2015, KDD.

[22]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[24]  Koby Crammer,et al.  Multi-Class Pegasos on a Budget , 2010, ICML.

[25]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[26]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[27]  Koray Kavukcuoglu,et al.  A Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2012, ICML.

[28]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[29]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[30]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[31]  Xiaotong Shen,et al.  On L1-Norm Multiclass Support Vector Machines , 2007 .

[32]  Li-Rong Dai,et al.  Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.