Finding Small Sets of Random Fourier Features for Shift-Invariant Kernel Approximation

Kernel based learning is very popular in machine learning, but many classical methods have at least quadratic runtime complexity. Random fourier features are very effective to approximate shift-invariant kernels by an explicit kernel expansion. This permits to use efficient linear models with much lower runtime complexity. As one key approach to kernelize algorithms with linear models they are successfully used in different methods. However, the number of features needed to approximate the kernel is in general still quite large with substantial memory and runtime costs. Here, we propose a simple test to identify a small set of random fourier features with linear costs, substantially reducing the number of generated features for low rank kernel matrices, while widely keeping the same representation accuracy. We also provide generalization bounds for the proposed approach.

[1]  Le Song,et al.  Least Squares Revisited: Scalable Approaches for Multi-class Prediction , 2013, ICML.

[2]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[3]  Frank-Michael Schleif,et al.  Fast approximated relational and kernel clustering , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Thomas Villmann,et al.  High Dimensional Matrix Relevance Learning , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[6]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[7]  Ata Kabán,et al.  Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions , 2015, Machine Learning.

[8]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[9]  Thomas Villmann,et al.  Margin based Active Learning for LVQ Networks , 2007, ESANN.

[10]  Sanjoy Dasgupta,et al.  Learning the structure of manifolds using random projections , 2007, NIPS.

[11]  Frank-Michael Schleif,et al.  Learning vector quantization for (dis-)similarities , 2014, Neurocomputing.

[12]  Rasmus Pagh,et al.  Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.

[13]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[14]  Thomas Martinetz,et al.  The Support Feature Machine: Classification with the Least Number of Features and Application to Neuroimaging Data , 2013, Neural Computation.

[15]  Ameet Talwalkar,et al.  On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.

[16]  Thomas Villmann,et al.  Kernelized vector quantization in gradient-descent learning , 2015, Neurocomputing.

[17]  Thomas Villmann,et al.  Efficient Kernelized Prototype Based Classification , 2011, Int. J. Neural Syst..

[18]  Frank-Michael Schleif,et al.  Metric and non-metric proximity transformations at linear costs , 2014, Neurocomputing.

[19]  Johan A. K. Suykens,et al.  Optimized fixed-size kernel models for large data sets , 2010, Comput. Stat. Data Anal..

[20]  Rong Jin,et al.  Efficient Kernel Clustering Using Random Fourier Features , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Frank-Michael Schleif,et al.  Low-Rank Kernel Space Representations in Prototype Learning , 2016, WSOM.

[22]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[23]  Frank-Michael Schleif,et al.  Learning interpretable kernelized prototype-based models , 2014, Neurocomputing.

[24]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[25]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[26]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.