Chunking for massive nonlinear kernel classification

A chunking procedure [Bradley, P.S. and Mangasarian, O.L., 2000, Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13, 1–10. Available online at: ftp://ftp.cs.wisc.edu/mathprog/tech-reports/98-05.ps] utilized in [Mangasarian, O.L. and Thompson, M.E., 2006, Massive data classification via unconstrained support vector machines. Journal of Optimization Theory and Applications, 131, 315–325. Available online at: ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/06-01.pdf] for linear classifiers is proposed here for nonlinear kernel classification of massive datasets. A highly accurate algorithm based on nonlinear support vector machines that utilize a linear programming formulation [Mangasarian, O.L., 2000, Generalized support vector machines. In: A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans (Eds) Advances in Large Margin Classifiers (Cambridge, MA: MIT Press), pp. 135–146. Available online at: ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps] is developed here as a completely unconstrained minimization problem [Mangasarian, O.L., 2005, Exact 1-Norm support vector machines via unconstrained convex differentiable minimization. Technical Report 05-03, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin. Available online at: ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/05-03.ps. Journal of Machine Learning Research, 7, 1517–1530, 2006.]. This approach together with chunking leads to a simple and accurate method for generating nonlinear classifiers for a 250,000-point dataset that typically exceeds machine capacity when standard linear programming methods such as CPLEX [ILOG, 2003, ILOG CPLEX 9.0 User's Manual, Incline Village, Nevada. Available online at: http://www.ilog.com/products/cplex/] are used. Because a 1-norm support vector machine underlies the proposed method, the approach together with a reduced support vector machine formulation [Lee, Y.-J. and Mangasarian, O.L., 2001, RSVM: reduced support vector machines. Proceedings of the First SIAM International Conference on Data Mining, Chicago, 5–7 April, CD-ROM. Available online at: ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-07.ps] minimizes the number of kernel functions utilized to generate a simplified nonlinear classifier. †Data Mining Institute Technical Report 06-07, December 2006.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[3]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[4]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[5]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[6]  J. Hiriart-Urruty,et al.  Generalized Hessian matrix and second-order optimality conditions for problems withC1,1 data , 1984 .

[7]  Olvi L. Mangasarian,et al.  Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[8]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[9]  O. Mangasarian,et al.  Massive Data Classification via Unconstrained Support Vector Machines , 2006 .

[10]  Francisco Facchinei,et al.  Minimization of SC1 functions and the Maratos effect , 1995, Oper. Res. Lett..

[11]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[14]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[15]  Olvi L. Mangasarian,et al.  A Finite Newton Method for Classi cation Problems , 2001 .

[16]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[17]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[18]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[19]  Simon Haykin,et al.  Generalized support vector machines , 1999, ESANN.

[20]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[21]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[22]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[23]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .