F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation

The generalization error bound of the support vector machine (SVM) depends on the ratio of the radius and margin. However, conventional SVM only considers the maximization of the margin but ignores the minimization of the radius, which restricts its performance when applied to joint learning of feature transformation and the SVM classifier. Although several approaches have been proposed to integrate the radius and margin information, most of them either require the form of the transformation matrix to be diagonal, or are nonconvex and computationally expensive. In this paper, we suggest a novel approximation for the radius of the minimum enclosing ball in feature space, and then propose a convex radius-margin-based SVM model for joint learning of feature transformation and the SVM classifier, i.e., F-SVM. A generalized block coordinate descent method is adopted to solve the F-SVM model, where the feature transformation is updated via the gradient descent and the classifier is updated by employing the existing SVM solver. By incorporating with kernel principal component analysis, F-SVM is further extended for joint learning of nonlinear transformation and the classifier. F-SVM can also be incorporated with deep convolutional networks to improve image classification performance. Experiments on the UCI, LFW, MNIST, CIFAR-10, CIFAR-100, and Caltech101 data sets demonstrate the effectiveness of F-SVM.

[1]  Lei Zhang,et al.  Towards effective codebookless model for image classification , 2015, Pattern Recognit..

[2]  Gavin C. Cawley,et al.  Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers , 2003, Pattern Recognit..

[3]  Jianwu Dang,et al.  Improved support vector machine algorithm for heterogeneous data , 2015, Pattern Recognit..

[4]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[5]  Lei Wang,et al.  Efficient Dual Approach to Distance Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Kernel Machines , 2012, ArXiv.

[7]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[9]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[10]  Jianxin Wu,et al.  Linear Regression-Based Efficient SVM Learning for Large-Scale Classification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Lei Wang,et al.  An Efficient Approach to Integrating Radius Information into Multiple Kernel Learning , 2013, IEEE Transactions on Cybernetics.

[12]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[13]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Guodong Guo,et al.  Support vector machines for face recognition , 2001, Image Vis. Comput..

[15]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[17]  Changshui Zhang,et al.  Learning similarity metric with SVM , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[18]  Jian Yang,et al.  Is ICA significantly better than PCA for face recognition? , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  K. R. Al-Balushi,et al.  Artificial neural networks and support vector machines with genetic algorithm for bearing fault detection , 2003 .

[20]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[22]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[23]  Dinggang Shen,et al.  An efficient radius-incorporated MKL algorithm for Alzheimer's disease prediction , 2015, Pattern Recognit..

[24]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Tomaso A. Poggio,et al.  Face recognition with support vector machines: global versus component-based approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Tony Jebara,et al.  Maximum Relative Margin and Data-Dependent Regularization , 2010, J. Mach. Learn. Res..

[27]  Lei Zhang,et al.  Shrinkage Expansion Adaptive Metric Learning , 2014, ECCV.

[28]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[29]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[30]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[31]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[32]  Zhenhua Guo,et al.  Two-Dimensional Whitening Reconstruction for Enhancing Robustness of Principal Component Analysis , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[34]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[35]  Jitendra Malik,et al.  Training Deformable Part Models with Decorrelated Features , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[37]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[40]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[41]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[42]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[43]  Jürgen Schmidhuber,et al.  Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[44]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[45]  David Zhang,et al.  A Kernel Classification Framework for Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Alexandros Kalousis,et al.  Convex formulations of radius-margin based Support Vector Machines , 2013, ICML.

[48]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[49]  Melanie Hilario,et al.  Margin and Radius Based Multiple Kernel Learning , 2009, ECML/PKDD.

[50]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[51]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[52]  Jonghyun Choi,et al.  Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Changshui Zhang,et al.  Learning Kernels with Radiuses of Minimum Enclosing Balls , 2010, NIPS.

[54]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[55]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[56]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[58]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[59]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[61]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, CVPR.

[62]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[63]  Lajos Hanzo,et al.  Support vector machine multiuser receiver for DS-CDMA signals in multipath channels , 2001, IEEE Trans. Neural Networks.

[64]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[66]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[67]  Melanie Hilario,et al.  Feature Weighting Using Margin and Radius Based Error Bound Optimization in SVMs , 2009, ECML/PKDD.

[68]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[69]  Christoph H. Lampert,et al.  Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[71]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[72]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[73]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.