论文信息 - But How Does It Work in Theory? Linear SVM with Random Features

But How Does It Work in Theory? Linear SVM with Random Features

We prove that, under low noise assumptions, the support vector machine with $N\ll m$ random features (RFSVM) can achieve the learning rate faster than $O(1/\sqrt{m})$ on a training set with $m$ samples when an optimized feature map is used. Our work extends the previous fast rate analysis of random features method from least square loss to 0-1 loss. We also show that the reweighted feature selection method, which approximates the optimized feature map, helps improve the performance of RFSVM in experiments on a synthetic data set.

[1] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .

[2] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3] Ameet Talwalkar,et al. On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[4] Zhuang Wang,et al. Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[5] Tara N. Sainath,et al. Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[7] Zoltán Szabó,et al. Optimal Rates for Random Fourier Features , 2015, NIPS.

[8] Lorenzo Rosasco,et al. Generalization Properties of Learning with Random Features , 2016, NIPS.

[9] Francis R. Bach,et al. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[10] Jeff G. Schneider,et al. On the Error of Random Fourier Features , 2015, UAI.

[11] H. Widom. Asymptotic behavior of the eigenvalues of certain integral equations , 1963 .

[12] V. Koltchinskii,et al. Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[13] Le Song,et al. Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[14] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[15] Don R. Hush,et al. Radial kernels and their reproducing kernel Hilbert spaces , 2010, J. Complex..

[16] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[17] Hyunjoong Kim,et al. Functional Analysis I , 2017 .

[18] Zaïd Harchaoui,et al. Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[19] Rong Jin,et al. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.