Structural Regularized Support Vector Machine: A Framework for Structural Large Margin Classifier

Support vector machine (SVM), as one of the most popular classifiers, aims to find a hyperplane that can separate two classes of data with maximal margin. SVM classifiers are focused on achieving more separation between classes than exploiting the structures in the training data within classes. However, the structural information, as an implicit prior knowledge, has recently been found to be vital for designing a good classifier in different real-world problems. Accordingly, using as much prior structural information in data as possible to help improve the generalization ability of a classifier has yielded a class of effective structural large margin classifiers, such as the structured large margin machine (SLMM) and the Laplacian support vector machine (LapSVM). In this paper, we unify these classifiers into a common framework from the concept of “structural granularity” and the formulation for optimization problems. We exploit the quadratic programming (QP) and second-order cone programming (SOCP) methods, and derive a novel large margin classifier, we call the new classifier the structural regularized support vector machine (SRSVM). Unlike both SLMM at the cross of the cluster granularity and SOCP and LapSVM at the cross of the point granularity and QP, SRSVM is located at the cross of the cluster granularity and QP and thus follows the same optimization formulation as LapSVM to overcome large computational complexity and non-sparse solution in SLMM. In addition, it integrates the compactness within classes with the separability between classes simultaneously. Furthermore, it is possible to derive generalization bounds for these algorithms by using eigenvalue analysis of the kernel matrices. Experimental results demonstrate that SRSVM is often superior in classification and generalization performances to the state-of-the-art algorithms in the framework, both with the same and different structural granularities.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[3]  Dale Schuurmans,et al.  Tangent-corrected embedding , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[5]  Qiang Yang,et al.  Discriminatively regularized least-squares classification , 2009, Pattern Recognit..

[6]  R. C. Williamson,et al.  Generalization Bounds via Eigenvalues of the Gram matrix , 1999 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Yoshihiro Yamanishi,et al.  On Pairwise Kernels: An Efficient Alternative and Generalization Analysis , 2009, PAKDD.

[9]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[10]  Defeng Wang,et al.  Structured One-Class Classification , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  R. S. Kroon,et al.  Support vector machines, generalization bounds, and transduction , 2003 .

[12]  Yi Li,et al.  A generative/discriminative learning algorithm for image classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Alain Biem,et al.  Semisupervised Least Squares Support Vector Machine , 2009, IEEE Transactions on Neural Networks.

[14]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[15]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[16]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Eric C. C. Tsang,et al.  Nesting One-Against-One Algorithm Based on SVMs for Pattern Classification , 2008, IEEE Transactions on Neural Networks.

[18]  King-Sun Fu,et al.  A Sentence-to-Sentence Clustering Procedure for Pattern Analysis , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[20]  Pannagadatta K. Shivaswamy Ellipsoidal Kernel Machines , 2007 .

[21]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[22]  Laura Palagi,et al.  A Convergent Hybrid Decomposition Algorithm Model for SVM Training , 2009, IEEE Transactions on Neural Networks.

[23]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[24]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[25]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[26]  Huanhuan Chen,et al.  Probabilistic Classification Vector Machines , 2009, IEEE Transactions on Neural Networks.

[27]  Xun Liang,et al.  An Effective Method of Pruning Support Vector Machine Classifiers , 2010, IEEE Transactions on Neural Networks.

[28]  Michael R. Lyu,et al.  Learning large margin classifiers locally and globally , 2004, ICML.

[29]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[30]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[31]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[32]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[33]  Daniel S. Yeung,et al.  Structured large margin machines: sensitive to data distributions , 2007, Machine Learning.

[34]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[35]  Qiang Yang,et al.  Structural Support Vector Machine , 2008, ISNN.

[36]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[37]  Songcan Chen,et al.  Locality preserving CCA with applications to data visualization and pose estimation , 2007, Image Vis. Comput..

[38]  Mark Herbster,et al.  Combining Graph Laplacians for Semi-Supervised Learning , 2005, NIPS.

[39]  Witold Pedrycz,et al.  Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error , 2007, Pattern Recognit..

[40]  Xinyu Guo,et al.  Pruning Support Vector Machines Without Altering Performances , 2008, IEEE Transactions on Neural Networks.