Extended Polynomial Growth Transforms for Design and Training of Generalized Support Vector Machines

Growth transformations constitute a class of fixed-point multiplicative update algorithms that were originally proposed for optimizing polynomial and rational functions over a domain of probability measures. In this paper, we extend this framework to the domain of bounded real variables which can be applied towards optimizing the dual cost function of a generic support vector machine (SVM). The approach can, therefore, not only be used to train traditional soft-margin binary SVMs, one-class SVMs, and probabilistic SVMs but can also be used to design novel variants of SVMs with different types of convex and quasi-convex loss functions. In this paper, we propose an efficient training algorithm based on polynomial growth transforms, and compare and contrast the properties of different SVM variants using several synthetic and benchmark data sets. The preliminary experiments show that the proposed multiplicative update algorithm is more scalable and yields better convergence compared to standard quadratic and nonlinear programming solvers. While the formulation and the underlying algorithms have been validated in this paper only for SVM-based learning, the proposed approach is general and can be applied to a wide variety of optimization problems and statistical learning models.

[1]  Nello Cristianini,et al.  A multiplicative updating algorithm for training support vector machine , 1999, ESANN.

[2]  A. Nadas,et al.  A generalization of the Baum algorithm to rational objective functions , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines , 2002, NIPS.

[6]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[7]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[10]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[11]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[14]  Gert Cauwenberghs,et al.  Design and implementation of ultra-low power pattern and sequence decoders , 2005 .

[15]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Dimitri Kanevsky A generalization of the Baum algorithm to functions on non-linear manifolds , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  David Haussler,et al.  Proceedings of the fifth annual workshop on Computational learning theory , 1992, COLT 1992.

[18]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[19]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[23]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[24]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[25]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  G. Wahba Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV 1 , 1998 .

[27]  Gert Cauwenberghs,et al.  Forward Decoding Kernel Machines: A Hybrid HMM/SVM Approach to Sequence Recognition , 2002, SVM.

[28]  Gert Cauwenberghs,et al.  Gini Support Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression , 2007, J. Mach. Learn. Res..

[29]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[30]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  Jayadeva Learning a hyperplane classifier by minimizing an exact bound on the VC dimension , 2015, Neurocomputing.

[33]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[34]  Jason D. M. Rennie,et al.  Loss Functions for Preference Levels: Regression with Discrete Ordered Labels , 2005 .

[35]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[36]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[37]  A. Goldstein Convex programming in Hilbert space , 1964 .

[38]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[39]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[40]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[41]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[42]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[43]  S. Keerthi,et al.  A general formulation for support vector machines , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[44]  Harvey J. Greenberg,et al.  A Review of Quasi-Convex Functions , 1971, Oper. Res..

[45]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..