D2KE: From Distance to Kernel and Embedding

For many machine learning problem settings, particularly with structured inputs such as sequences or sets of objects, a distance measure between inputs can be specified more naturally than a feature representation. However, most standard machine models are designed for inputs with a vector feature representation. In this work, we consider the estimation of a function $f:\mathcal{X} \rightarrow \R$ based solely on a dissimilarity measure $d:\mathcal{X}\times\mathcal{X} \rightarrow \R$ between inputs. In particular, we propose a general framework to derive a family of \emph{positive definite kernels} from a given dissimilarity measure, which subsumes the widely-used \emph{representative-set method} as a special case, and relates to the well-known \emph{distance substitution kernel} in a limiting case. We show that functions in the corresponding Reproducing Kernel Hilbert Space (RKHS) are Lipschitz-continuous w.r.t. the given distance metric. We provide a tractable algorithm to estimate a function from this RKHS, and show that it enjoys better generalizability than Nearest-Neighbor estimates. Our approach draws from the literature of Random Features, but instead of deriving feature maps from an existing kernel, we construct novel kernels from a random feature map, that we specify given the distance measure. We conduct classification experiments with such disparate domains as strings, time series, and sets of vectors, where our proposed framework compares favorably to existing distance-based learning methods such as $k$-nearest-neighbors, distance-substitution kernels, pseudo-Euclidean embedding, and the representative-set method.

[1]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[4]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[5]  Robert P. W. Duin,et al.  Dissimilarity-based classification for vectorial representations , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[7]  R. Duin,et al.  The dissimilarity representation for pattern recognition , a tutorial , 2009 .

[8]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[10]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[11]  Eloy Romero,et al.  PRIMME_SVDS: A High-Performance Preconditioned SVD Solver for Accurate Large-Scale Computations , 2016, SIAM J. Sci. Comput..

[12]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[13]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[14]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[15]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[16]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[17]  Andreas Stathopoulos,et al.  A Preconditioned Hybrid SVD Method for Accurately Computing Singular Triplets of Large Matrices , 2015, SIAM J. Sci. Comput..

[18]  RahimiAli,et al.  Similarity-based Classification: Concepts and Algorithms , 2009 .

[19]  Robert P. W. Duin,et al.  The dissimilarity space: Bridging structural and statistical pattern recognition , 2012, Pattern Recognit. Lett..

[20]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[21]  Robert P. W. Duin,et al.  Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[23]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[24]  Brian Kingsbury,et al.  Efficient one-vs-one kernel ridge regression for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Jie Chen,et al.  Revisiting Random Binning Features: Fast Convergence and Strong Parallelizability , 2016, KDD.

[26]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[27]  Hsuan-Tien Lin A Study on Sigmoid Kernels for SVM and the Training of non-PSD Kernels by SMO-type Methods , 2005 .

[28]  Maria-Florina Balcan,et al.  A theory of learning with similarity functions , 2008, Machine Learning.

[29]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[30]  John C. Duchi,et al.  Learning Kernels with Random Features , 2016, NIPS.

[31]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[32]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[33]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.

[34]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[35]  Jieping Ye,et al.  Training SVM with indefinite kernels , 2008, ICML '08.

[36]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[37]  Maya R. Gupta,et al.  Learning kernels from indefinite similarities , 2009, ICML '09.

[38]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[39]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .