Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces

Proximity captures the degree of similarity between examples and is thereby fundamental in learning. Learning from pairwise proximity data usually relies on either kernel methods for specifically designed kernels or the nearest neighbor (NN) rule. Kernel methods are powerful, but often cannot handle arbitrary proximities without necessary corrections. The NN rule can work well in such cases, but suffers from local decisions. The aim of this paper is to provide an indispensable explanation and insights about two simple yet powerful alternatives when neither conventional kernel methods nor the NN rule can perform best. These strategies use two proximity-based representation spaces (RSs) in which accurate classifiers are trained on all training objects and demand comparisons to a small set of prototypes. They can handle all meaningful dissimilarity measures, including non-Euclidean and nonmetric ones. Practical examples illustrate that these RSs can be highly advantageous in supervised learning. Simple classifiers built there tend to outperform the NN rule. Moreover, computational complexity may be controlled. Consequently, these approaches offer an appealing alternative to learn from proximity data for which kernel methods cannot directly be applied, are too costly or impractical, while the NN rule leads to noisy results.

[1]  Stephen J. Wright,et al.  Kernel Regularization and Dimension Reduction , 2006 .

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Robert P. W. Duin,et al.  The Science of Pattern Recognition. Achievements and Perspectives , 2007, Challenges for Computational Intelligence.

[4]  W. Eric L. Grimson,et al.  Prototype optimization for nearest-neighbor classification , 2002, Pattern Recognit..

[5]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Robert P. W. Duin,et al.  Experiments with a featureless approach to pattern recognition , 1997, Pattern Recognit. Lett..

[8]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[9]  Anil K. Jain,et al.  Representation and Recognition of Handwritten Digits Using Deformable Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Enrique Vidal,et al.  A class-dependent weighted dissimilarity measure for nearest neighbor classification problems , 2000, Pattern Recognit. Lett..

[11]  Joachim M. Buhmann,et al.  Optimal Cluster Preserving Embedding of Nonmetric Proximity Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  I. J. Schoenberg On Certain Metric Spaces Arising From Euclidean Spaces by a Change of Metric and Their Imbedding in Hilbert Space , 1937 .

[13]  Stephen J. Wright,et al.  Framework for kernel regularization with application to protein clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Gower Euclidean Distance Geometry , 1982 .

[15]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[16]  Robert P. W. Duin,et al.  Sammon's mapping using neural networks: A comparison , 1997, Pattern Recognit. Lett..

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  LuxburgUlrike von,et al.  Distance--Based Classification with Lipschitz Functions , 2004 .

[19]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20]  S Edelman,et al.  Representation is representation of similarities , 1996, Behavioral and Brain Sciences.

[21]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[22]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[24]  Horst Bunke,et al.  Towards Bridging the Gap between Statistical and Structural Pattern Recognition: Two New Concepts in Graph Matching , 2001, ICAPR.

[25]  Karl Menger,et al.  New Foundation of Euclidean Geometry , 1931 .

[26]  J. Rovnyak Methods of Kreĭn Space Operator Theory , 2002 .

[27]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[28]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[29]  Horst Bunke,et al.  On Not Making Dissimilarities Euclidean , 2004, SSPR/SPR.

[30]  Hongyuan Zha,et al.  Isometric Embedding and Continuum ISOMAP , 2003, ICML.

[31]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[32]  G. Wahba Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV 1 , 1998 .

[33]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[34]  Horst Bunke,et al.  Syntactic and structural pattern recognition : theory and applications , 1990 .

[35]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[36]  Cheng Soon Ong,et al.  Splines with non positive kernels , 2009 .

[37]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[38]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[39]  Casimir A. Kulikowski,et al.  Featureless Pattern Recognition in an Imaginary Hilbert Space and Its Application to Protein Fold Classification , 2001, MLDM.

[40]  L. Goldfarb,et al.  What is a structural representation? A proposal for a representational formalism Fifth variation , 2006 .

[41]  Remco C. Veltkamp,et al.  State of the Art in Shape Matching , 2001, Principles of Visual Information Retrieval.

[42]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[43]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[44]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[45]  A. Tversky Features of Similarity , 1977 .

[46]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[47]  Mário A. T. Figueiredo,et al.  Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[48]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[49]  Jirí Matousek,et al.  Low-Distortion Embeddings of Finite Metric Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[50]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .

[51]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[52]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[53]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[54]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[55]  J. Bognár,et al.  Indefinite Inner Product Spaces , 1974 .

[56]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[57]  Bernard Victorri,et al.  Transformation invariance in pattern recognition: Tangent distance and propagation , 2000 .

[58]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[59]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[60]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[61]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[62]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[63]  S. Canu,et al.  Functional learning through kernel , 2002 .

[64]  Klaus-Robert Müller,et al.  Feature Discovery in Non-Metric Pairwise Data , 2004, J. Mach. Learn. Res..

[65]  Cor J. Veenman,et al.  Turning the hyperparameter of an AUC-optimized classifier , 2005, BNAIC.

[66]  Robert P. W. Duin,et al.  A Study On Combining Image Representations For Image Classification And Retrieval , 2004, Int. J. Pattern Recognit. Artif. Intell..

[67]  Robert P. W. Duin,et al.  Dissimilarity-based classification for vectorial representations , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[68]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[69]  R. Duin,et al.  The dissimilarity representation for pattern recognition , a tutorial , 2009 .

[70]  Joachim M. Buhmann,et al.  Going Metric: Denoising Pairwise Data , 2002, NIPS.

[71]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[72]  Francesc J. Ferri,et al.  An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering , 2002, Pattern Recognit..

[73]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[74]  Klaus Obermayer,et al.  Support Vector Machines for Dyadic Data , 2006, Neural Computation.

[75]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[76]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[78]  Arthur Cayley,et al.  The Collected Mathematical Papers: On a Theorem in the Geometry of Position , 2009 .

[79]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[81]  Thomas Hofmann,et al.  Learning Over Compact Metric Spaces , 2004, COLT.

[82]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[83]  C. Watkins Dynamic Alignment Kernels , 1999 .

[84]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[85]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[87]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[88]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[89]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[90]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[91]  Pavel Pudil,et al.  Road sign classification using Laplace kernel classifier , 2000, Pattern Recognit. Lett..

[92]  Pierre Courrieu,et al.  Straight monotonic embedding of data sets in Euclidean spaces , 2002, Neural Networks.

[93]  A. Cayley,et al.  Sur quelques théorèmes de la geomotrie de position. , 1846 .

[94]  R. C. Williamson,et al.  Classification on proximity data with LP-machines , 1999 .

[95]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[96]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[97]  Donato Malerba,et al.  Similarity and Dissimilarity , 2000 .

[98]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[99]  Robert P. W. Duin,et al.  Dissimilarity-based classification of spectra: computational issues , 2003, Real Time Imaging.

[100]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[102]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .