论文信息 - Chunking for massive nonlinear kernel classification

Chunking for massive nonlinear kernel classification

A chunking procedure [Bradley, P.S. and Mangasarian, O.L., 2000, Massive data discrimination via linear support vector machines. Optimization Methods and Software, 13, 1–10. Available online at: ftp://ftp.cs.wisc.edu/mathprog/tech-reports/98-05.ps] utilized in [Mangasarian, O.L. and Thompson, M.E., 2006, Massive data classification via unconstrained support vector machines. Journal of Optimization Theory and Applications, 131, 315–325. Available online at: ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/06-01.pdf] for linear classifiers is proposed here for nonlinear kernel classification of massive datasets. A highly accurate algorithm based on nonlinear support vector machines that utilize a linear programming formulation [Mangasarian, O.L., 2000, Generalized support vector machines. In: A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans (Eds) Advances in Large Margin Classifiers (Cambridge, MA: MIT Press), pp. 135–146. Available online at: ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps] is developed here as a completely unconstrained minimization problem [Mangasarian, O.L., 2005, Exact 1-Norm support vector machines via unconstrained convex differentiable minimization. Technical Report 05-03, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin. Available online at: ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/05-03.ps. Journal of Machine Learning Research, 7, 1517–1530, 2006.]. This approach together with chunking leads to a simple and accurate method for generating nonlinear classifiers for a 250,000-point dataset that typically exceeds machine capacity when standard linear programming methods such as CPLEX [ILOG, 2003, ILOG CPLEX 9.0 User's Manual, Incline Village, Nevada. Available online at: http://www.ilog.com/products/cplex/] are used. Because a 1-norm support vector machine underlies the proposed method, the approach together with a reduced support vector machine formulation [Lee, Y.-J. and Mangasarian, O.L., 2001, RSVM: reduced support vector machines. Proceedings of the First SIAM International Conference on Data Mining, Chicago, 5–7 April, CD-ROM. Available online at: ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-07.ps] minimizes the number of kernel functions utilized to generate a simplified nonlinear classifier. †Data Mining Institute Technical Report 06-07, December 2006.

Olvi L. Mangasarian | M. E. Thompson | O. Mangasarian | M. E. Thompson

[1] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2] Dustin Boswell,et al. Introduction to Support Vector Machines , 2002 .

[3] Vladimir Cherkassky,et al. Learning from Data: Concepts, Theory, and Methods , 1998 .

[4] Vladimir Cherkassky,et al. The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[5] O. Mangasarian,et al. Massive data discrimination via linear support vector machines , 2000 .

[6] J. Hiriart-Urruty,et al. Generalized Hessian matrix and second-order optimality conditions for problems withC1,1 data , 1984 .

[7] Olvi L. Mangasarian,et al. Exact 1-Norm Support Vector Machines Via Unconstrained Convex Differentiable Minimization , 2006, J. Mach. Learn. Res..

[8] Glenn Fung,et al. A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[9] O. Mangasarian,et al. Massive Data Classification via Unconstrained Support Vector Machines , 2006 .

[10] Francisco Facchinei,et al. Minimization of SC1 functions and the Maratos effect , 1995, Oper. Res. Lett..

[11] Yuh-Jye Lee,et al. RSVM: Reduced Support Vector Machines , 2001, SDM.

[12] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13] R. Gomory,et al. A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[14] David G. Luenberger,et al. Linear and nonlinear programming , 1984 .

[15] Olvi L. Mangasarian,et al. A Finite Newton Method for Classi cation Problems , 2001 .

[16] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[17] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[18] George B. Dantzig,et al. Decomposition Principle for Linear Programs , 1960 .

[19] Simon Haykin,et al. Generalized support vector machines , 1999, ESANN.

[20] Robert A. Lordo,et al. Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[21] Yuval Rabani,et al. Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[22] Olvi L. Mangasarian,et al. A finite newton method for classification , 2002, Optim. Methods Softw..

[23] Bernhard Schölkopf,et al. Learning with kernels , 2001 .