Fast Binary and Multi-Output Reduced Set Selection

We propose fast algorithms for reducing the number of kernel evaluations in the testing phase for methods such as Support Vector Machines (SVM) and Ridge Regression (RR). For non-sparse methods such as RR this results in significantly improved prediction time. For binary SVMs, which are already sparse in their expansion, the pay off is mainly in the cases of noisy or large-scale problems. However, we then further develop our method for multi-class problems where, after choosing the expansion to find vectors which describe all the hyperplanes jointly, we again achieve significant gains. TR-132November 29, 2004

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[3]  Ali H. Sayed,et al.  Fundamentals Of Adaptive Filtering , 2003 .

[4]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[5]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[6]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[7]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[12]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[13]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[14]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[17]  Jason Weston,et al.  Breaking SVM Complexity with Cross-Training , 2004, NIPS.

[18]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[19]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[20]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..