Classification on dissimilarity data : a first look

In a dissimilarity (distance) data each pair of objects is characterized by a value which expresses the magnitude of difference between them. This type of data can be now classified using various approaches, provided that a new object is represented by its distances to the training samples. This paper discusses a number of possibilities to tackle such a classification problem. Two types of methods are investigated: the feature-based (i.e. interpreting the distance data as a feature space) and rank-based decision rules. Experiments conducted on real datasets demonstrate that the feature-based classifiers often outperform the rank-based ones. The normal-based decision rules perform well, since summation-based distances (frequently appearing in practice) are, under general conditions, approximately normally distributed. In addition, also the support vector classifier achieves a high accuracy, particularly in distance spaces of a very high dimensionality.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Robert P. W. Duin,et al.  Relational discriminant analysis , 1999, Pattern Recognit. Lett..

[3]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[4]  K. Tsuda Support Vector Classi er with Asymmetric Kernel Functions , 1998 .

[5]  Lev Goldfarb,et al.  What is distance and why do we need the metric model for pattern learning? , 1992, Pattern Recognit..

[6]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[7]  Frans C. A. Groen,et al.  The box-cox metric for nearest neighbour classification improvement , 1997, Pattern Recognit..

[8]  Robert P. W. Duin,et al.  Robust machine fault detection with independent component analysis and support vector data description , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[9]  Robert P. W. Duin,et al.  Bagging for linear classifiers , 1998, Pattern Recognit..

[10]  Lev Goldfarb,et al.  A unified approach to pattern recognition , 1984, Pattern Recognit..

[11]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[12]  Robert P. W. Duin,et al.  Classifiers for dissimilarity-based pattern recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[13]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..