Kernel Feature Maps from Arbitrary Distance Metrics

The approximation of kernel functions using explicit feature maps gained a lot of attention in recent years due to the tremendous speed up in training and learning time of kernel-based algorithms, making them applicable to very large-scale problems. For example, approximations based on random Fourier features are an efficient way to create feature maps for a certain class of scale invariant kernel functions. However, there are still many kernels for which there exists no algorithm to derive such maps. In this work we propose an efficient method to create approximate feature maps from an arbitrary distance metric using pseudo line projections called Distance-Based Feature Map (DBFM). We show that our approximation does not depend on the input dataset size or the dimension of the input space. We experimentally evaluate our approach on two real datasets using two metric and one non-metric distance function.

[1]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[2]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[3]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[4]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[5]  Alain Valette,et al.  Kazhdan's Property (T): List of symbols , 2008 .

[6]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[7]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[8]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[9]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[10]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[11]  Alain Valette,et al.  Kazhdan's Property (T): KAZHDAN'S PROPERTY (T) , 2008 .

[12]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[13]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[14]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[15]  Alain Valette,et al.  Kazhdan's Property (T) , 2008 .

[16]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[17]  Andreas Christmann,et al.  Support Vector Machines , 2008, Data Mining and Knowledge Discovery Handbook.

[18]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[20]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[22]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[23]  Panagiotis Papapetrou,et al.  Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[25]  Gerhard Widmer,et al.  A fast audio similarity retrieval method for millions of music tracks , 2010, Multimedia Tools and Applications.

[26]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[27]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..