Support Vector Machines, Data Reduction, and Approximate Kernel Matrices

The computational and/or communication constraints associated with processing large-scale data sets using support vector machines (SVM) in contexts such as distributed networking systems are often prohibitively high, resulting in practitioners of SVM learning algorithms having to apply the algorithm on approximate versions of the kernel matrix induced by a certain degree of data reduction. In this paper, we study the tradeoffs between data reduction and the loss in an algorithm's classification performance. We introduce and analyze a consistent estimator of the SVM's achieved classification error, and then derive approximate upper bounds on the perturbation on our estimator. The bound is shown to be empirically tight in a wide range of domains, making it practical for the practitioner to determine the amount of data reduction given a permissible loss in the classification performance.

[1]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[2]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis: List of kernels , 2004 .

[4]  Bernhard Schölkopf,et al.  Sampling Techniques for Kernel Methods , 2001, NIPS.

[5]  Larry S. Davis,et al.  Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.

[6]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[11]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[12]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Albrecht Böttcher,et al.  The Norm of the Product of a Large Matrix and a Random Vector , 2003 .

[16]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[17]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[18]  M. Ledoux The concentration of measure phenomenon , 2001 .

[19]  Bin Yu,et al.  BINNING IN GAUSSIAN KERNEL REGULARIZATION , 2006 .

[20]  Kamesh Munagala,et al.  Suppression and failures in sensor networks: a Bayesian approach , 2007, VLDB 2007.

[21]  Martin J. Wainwright,et al.  ON surrogate loss functions and f-divergences , 2005, math/0510521.

[22]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.