Example-dependent Basis Vector Selection for Kernel-Based Classifiers

We study methods for speeding up classification time of kernel-based classifiers. Existing solutions are based on explicitly seeking sparse classifiers during training, or by using budgeted versions of the classifier where one directly limits the number of basis vectors allowed. Here, we propose a more flexible alternative: instead of using the same basis vectors over the whole feature space, our solution uses different basis vectors in different parts of the feature space. At the core of our solution lies an optimization procedure that, given a set of basis vectors, finds a good partition of the feature space and good subsets of the existing basis vectors. Using this procedure repeatedly, we build trees whose internal nodes specify feature space partitions and whose leaves implement simple kernel classifiers. Experiments suggest that our method reduces classification time significantly while maintaining performance. In addition, we propose several heuristics that also perform well.

[1]  Andy J. Keane,et al.  Some Greedy Learning Algorithms for Sparse Regression and Classification with Mercer Kernels , 2003, J. Mach. Learn. Res..

[2]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[3]  Antti Ukkonen,et al.  The Support Vector Tree , 2010, Algorithms and Applications.

[4]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[5]  FreundYoav,et al.  Large Margin Classification Using the Perceptron Algorithm , 1999 .

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[10]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[11]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[12]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[13]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[14]  Jason Weston,et al.  Online (and Offline) on an Even Tighter Budget , 2005, AISTATS.

[15]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[16]  Roni Khardon,et al.  Noise Tolerant Variants of the Perceptron Algorithm , 2007, J. Mach. Learn. Res..

[17]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[18]  Jiun-Hung Chen,et al.  Reducing SVM classification time using multiple mirror classifiers , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[20]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine Learning.

[21]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[22]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[23]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[24]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[25]  Yoram Singer,et al.  Support Vector Machines on a Budget , 2006, NIPS.

[26]  Bernhard Schölkopf,et al.  Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference , 2004, NIPS 2004.

[27]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[28]  K. Bennett,et al.  A support vector machine approach to decision trees , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[29]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[30]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.