Fast Prediction with SVM Models Containing RBF Kernels

We present an approximation scheme for support vector machine models that use an RBF kernel. A second-order Maclaurin series approximation is used for exponentials of inner products between support vectors and test instances. The approximation is applicable to all kernel methods featuring sums of kernel evaluations and makes no assumptions regarding data normalization. The prediction speed of approximated models no longer relates to the amount of support vectors but is quadratic in terms of the number of input dimensions. If the number of input dimensions is small compared to the amount of support vectors, the approximated model is significantly faster in prediction and has a smaller memory footprint. An optimized C++ implementation was made to assess the gain in prediction speed in a set of practical tests. We additionally provide a method to verify the approximation accuracy, prior to training models or during run-time, to ensure the loss in accuracy remains acceptable and within known bounds.

[1]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[2]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3]  Davide Anguita,et al.  Fast training of Support Vector Machines for regression , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[4]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[5]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[6]  Subhransu Maji,et al.  Efficient Classification for Additive Kernel SVMs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Mark Herbster,et al.  Learning Additive Models Online with Fast Evaluating Kernels , 2001, COLT/EuroCOLT.

[8]  W. Marsden I and J , 2012 .

[9]  J. Demmel,et al.  Sun Microsystems , 1996 .

[10]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[11]  Dirk Eddelbuettel,et al.  Benchmarking Single- and Multi-Core BLAS Implementations and GPUs for use with R , 2010 .

[12]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13]  Jian Yang,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[16]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[18]  Bernhard Schölkopf,et al.  Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces , 1998, DAGM-Symposium.

[19]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[20]  Johan A. K. Suykens,et al.  A Comparison of Pruning Algorithms for Sparse Least Squares Support Vector Machines , 2004, ICONIP.

[21]  Andrew Moore,et al.  Security Architecture , 2005 .

[22]  Henry C. Chueh,et al.  A security architecture for query tools used to access large biomedical databases , 2002, AMIA.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Johan A. K. Suykens,et al.  LS-SVMlab Toolbox User's Guide version 1.7 , 2003 .

[25]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[26]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[27]  Johan A. K. Suykens,et al.  A support vector machine formulation to PCA analysis and its kernel version , 2003, IEEE Trans. Neural Networks.

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[30]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[31]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[32]  Leonid A. Levin,et al.  The Tale of One-Way Functions , 2000, Probl. Inf. Transm..

[33]  Nathan Srebro,et al.  Explicit Approximations of the Gaussian Kernel , 2011, ArXiv.

[34]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[35]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[36]  Sungzoon Cho,et al.  Approximating support vector machine with artificial neural network for fast prediction , 2014, Expert Syst. Appl..

[37]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[38]  Hui Cao,et al.  Approximate RBF Kernel SVM and Its Applications in Pedestrian Classification , 2008 .

[39]  Johan A. K. Suykens,et al.  Weighted least squares support vector machines: robustness and sparse approximation , 2002, Neurocomputing.

[40]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[41]  Nicol N. Schraudolph,et al.  A Fast, Compact Approximation of the Exponential Function , 1999, Neural Computation.

[42]  Douglas M. Blough,et al.  Data obfuscation: anonymity and desensitization of usable data sets , 2004, IEEE Security & Privacy Magazine.

[43]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[44]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[45]  Xun Liang,et al.  An Effective Method of Pruning Support Vector Machine Classifiers , 2010, IEEE Transactions on Neural Networks.

[46]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.