Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors

Support vector machines (and other kernel machines) offer robust modern machine learning methods for nonlinear classification. However, relative to other alternatives (such as linear methods, decision trees and neural networks), they can be orders of magnitude slower at query-time. Unlike existing methods that attempt to speedup querytime, such as reduced set compression (e.g. (Burges, 1996)) and anytime bounding (e.g. (DeCoste, 2002), we propose a new and efficient approach based on treating the kernel machine classifier as a special form of k nearest-neighbor. Our approach improves upon a traditional k-NN by determining at query-time a good k for each query, based on pre-query analysis guided by the original robust kernel machine. We demonstrate effectiveness on high-dimensional benchmark MNIST data, observing a greater than 100- fold reduction in the number of SVs required per query (amortized over all 45 pairwise MNIST digit classifiers), with no extra test errors (in fact, it happens to make 4 fewer).

[1]  Gunnar Rätsch,et al.  A Mathematical Programming Approach to the Kernel Fisher Algorithm , 2000, NIPS.

[2]  Michael I. Jordan,et al.  Minimax Probability Machine , 2001, NIPS.

[3]  Dennis DeCoste,et al.  Anytime Interval-Valued Outputs for Kernel Machines: Fast Support Vector Machine Classification via Distance Geometry , 2002, ICML.

[4]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[5]  Peter Yianilos,et al.  Excluded middle vantage point forests for nearest neighbor search , 1998 .

[6]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[7]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[8]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[9]  Andrew Blake,et al.  Computationally efficient face detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[12]  Bernhard Schölkopf,et al.  Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces , 1998, DAGM-Symposium.

[13]  Dennis DeCoste,et al.  Anytime Query-Tuned Kernel Machines via Cholesky Factorization , 2003, SDM.

[14]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[15]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[16]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.