Support Vector Machines: Training and Applications

The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT\&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi-Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM''s over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub-problems. As an application of SVM''s, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.

[1]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[2]  G. Zoutendijk,et al.  Methods of Feasible Directions , 1962, The Mathematical Gazette.

[3]  J. Stewart Positive definite functions and generalizations, an historical survey , 1976 .

[4]  Michael A. Saunders,et al.  Large-scale linearly constrained optimization , 1978, Math. Program..

[5]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, CDC 1981.

[6]  Gerardo Toraldo,et al.  On the Solution of Large Quadratic Programming Problems with Bound Constraints , 1991, SIAM J. Optim..

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[9]  David F. Shanno,et al.  An interior point method for quadratic programs based on conjugate projected gradients , 1993, Comput. Optim. Appl..

[10]  Gilles Burel,et al.  Detection and localization of faces on digital images , 1994, Pattern Recognit. Lett..

[11]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[12]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Kah Kay Sung,et al.  Learning and example selection for object and pattern detection , 1995 .

[14]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[15]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[16]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[17]  Norbert Krüger,et al.  Determination of face position and pose with a learned representation based on labelled graphs , 1997, Image Vis. Comput..

[18]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.