On Data-Dependent Random Features for Improved Generalization in Supervised Learning

The randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomized-feature approach in supervised learning for good generalizability. We propose the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters.

[1]  John C. Duchi,et al.  Learning Kernels with Random Features , 2016, NIPS.

[2]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[6]  Barnabás Póczos,et al.  Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[7]  Jeff G. Schneider,et al.  On the Error of Random Fourier Features , 2015, UAI.

[8]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[9]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[10]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[11]  Yiming Yang,et al.  Data-driven Random Fourier Features using Stein Effect , 2017, IJCAI.

[12]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[13]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[14]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[15]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[16]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[17]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[18]  Vikas Sindhwani,et al.  Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[19]  Lorenzo Rosasco,et al.  Generalization Properties of Learning with Random Features , 2016, NIPS.

[20]  Shih-Fu Chang,et al.  Compact Nonlinear Maps and Circulant Extensions , 2015, ArXiv.

[21]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[22]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[23]  Inderjit S. Dhillon,et al.  Goal-Directed Inductive Matrix Completion , 2016, KDD.

[24]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[25]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[26]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[27]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[28]  Yoram Baram,et al.  Learning by Kernel Polarization , 2005, Neural Computation.

[29]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[30]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[31]  Quanfu Fan,et al.  Random Laplace Feature Maps for Semigroup Kernels on Histograms , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.