But How Does It Work in Theory? Linear SVM with Random Features

We prove that, under low noise assumptions, the support vector machine with $N\ll m$ random features (RFSVM) can achieve the learning rate faster than $O(1/\sqrt{m})$ on a training set with $m$ samples when an optimized feature map is used. Our work extends the previous fast rate analysis of random features method from least square loss to 0-1 loss. We also show that the reweighted feature selection method, which approximates the optimized feature map, helps improve the performance of RFSVM in experiments on a synthetic data set.

[1]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[2]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[4]  Zhuang Wang,et al.  Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Tara N. Sainath,et al.  Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[7]  Zoltán Szabó,et al.  Optimal Rates for Random Fourier Features , 2015, NIPS.

[8]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[9]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[10]  Jeff G. Schneider,et al.  On the Error of Random Fourier Features , 2015, UAI.

[11]  H. Widom Asymptotic behavior of the eigenvalues of certain integral equations , 1963 .

[12]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[13]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[14]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[15]  Don R. Hush,et al.  Radial kernels and their reproducing kernel Hilbert spaces , 2010, J. Complex..

[16]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[17]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[18]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[19]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.