Classification of dissimilarity data with a new flexible Mahalanobis-like metric

Statistical pattern recognition traditionally relies on feature-based representation. For many applications, such vector representation is not available and we only possess proximity data (distance, dissimilarity, similarity, ranks, etc.). In this paper, we consider a particular point of view on discriminant analysis from dissimilarity data. Our approach is inspired by the Gaussian classifier and we defined decision rules to mimic the behavior of a linear or a quadratic classifier. The number of parameters is limited (two per class). Numerical experiments on artificial and real data show interesting behavior compared to Support Vector Machines and to kNN classifier: (a) lower or equivalent error rate, (b) equivalent CPU time, (c) more robustness with sparse dissimilarity data.

[1]  Jean-Pierre Gauchi,et al.  Computer-aided optimal designs for improving neural network generalization , 2008, Neural Networks.

[2]  Włodzisław Duch,et al.  Similarity-based methods: a general framework for classification, approximation and association , 2000 .

[3]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[4]  Aude Oliva,et al.  Classification of scene photographs from local orientations features , 2000, Pattern Recognit. Lett..

[5]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[6]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[7]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[8]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Remco C. Veltkamp,et al.  Using transportation distances for measuring melodic similarity , 2003, ISMIR.

[11]  Elzbieta Pekalska,et al.  The Dissimilarity representations in pattern recognition. Concepts, theory and applications. , 2005 .

[12]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[13]  Bernard Van Cutsem,et al.  Classification And Dissimilarity Analysis , 1994 .

[14]  Bernard Haasdonk,et al.  Tangent distance kernels for support vector machines , 2002, Object recognition supported by user interaction for service robots.

[15]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[16]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[17]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[18]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[19]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[20]  Tien Ho-Phuoc,et al.  A New Adaptation of Self-Organizing Map for Dissimilarity Data , 2007, IWANN.

[21]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Gilles Celeux,et al.  Discriminant Analysis on Dissimilarity Data : a New Fast Gaussian like Algorithm , 2001, AISTATS.

[24]  Patrick J. Grother,et al.  NIST Form-Based Handprint Recognition System , 1994 .

[25]  Charles L. Wilson,et al.  NIST form-based handprint recognition system (release 2.0) , 1997 .

[26]  Thomas Villmann,et al.  Classification using non-standard metrics , 2005, ESANN.