Hash Kernels

We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs.

[1]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[2]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[3]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[4]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[5]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[6]  C. Watkins Dynamic Alignment Kernels , 1999 .

[7]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[8]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[9]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[10]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[11]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[12]  S. Chatterjee Concentration Inequalities With Exchangeable Pairs , 2005 .

[13]  Choon Hui Teo,et al.  Fast and space efficient string kernels using suffix arrays , 2006, ICML.

[14]  Alexander J. Smola,et al.  Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes , 2007, International Journal of Computer Vision.

[15]  Thomas Hofmann,et al.  Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data , 2007 .

[16]  Aryeh Kontorovich A Universal Kernel for Learning Regular Languages , 2007, MLG.

[17]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[18]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[19]  Mark Dredze,et al.  Small Statistical Models by Random Feature Mixing , 2008, ACL 2008.

[20]  Natasa Przulj Biological network comparison using graphlet degree distribution , 2010, Bioinform..