Classification Similarity Learning Using Feature-Based and Distance-Based Representations: A Comparative Study

Automatically measuring the similarity between a pair of objects is a common and important task in the machine learning and pattern recognition fields. Being an object of study for decades, it has lately received an increasing interest from the scientific community. Usually, the proposed solutions have used either a feature-based or a distance-based representation to perform learning and classification tasks. This article presents the results of a comparative experimental study between these two approaches for computing similarity scores using a classification-based method. In particular, we use the Support Vector Machine as a flexible combiner both for a high dimensional feature space and for a family of distance measures, to finally learn similarity scores. The approaches have been tested in a content-based image retrieval context, using three different repositories. We analyze both the influence of the different input data formats and the training size on the performance of the classifier. Then, we found that a low-dimensional, multidistance-based representation can be convenient for small to medium-size training sets, whereas it is detrimental as the training size grows.

[1]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[2]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[3]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[4]  Jake K. Aggarwal,et al.  Combining structure, color and texture for image retrieval: A performance evaluation , 2002, Object recognition supported by user interaction for service robots.

[5]  Guillermo Ayala,et al.  A novel Bayesian framework for relevance feedback in image content-based retrieval systems , 2006, Pattern Recognit..

[6]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[7]  Fabio Roli,et al.  Nearest-prototype relevance feedback for content based image retrieval , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Esther de Ves,et al.  A relevance feedback CBIR algorithm based on fuzzy sets , 2008, Signal Process. Image Commun..

[9]  Stefan M. Rüger,et al.  Fractional Distance Measures for Content-Based Image Retrieval , 2005, ECIR.

[10]  Kate Smith-Miles,et al.  Improved Support Vector Machine Generalization Using Normalized Input Space , 2006, Australian Conference on Artificial Intelligence.

[11]  João Paulo Papa,et al.  Supervised pattern classification based on optimum‐path forest , 2009, Int. J. Imaging Syst. Technol..

[12]  Wan-Jui Lee,et al.  An experimental study on combining Euclidean distances , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[13]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[14]  Robert P. W. Duin,et al.  The dissimilarity space: Bridging structural and statistical pattern recognition , 2012, Pattern Recognit. Lett..

[15]  Yi Liu,et al.  FS_SFS: A novel feature selection method for support vector machines , 2006, Pattern Recognit..

[16]  João Paulo Papa,et al.  Supervised pattern classification based on optimum-path forest , 2009 .

[17]  Pierre Soille,et al.  Morphological Image Analysis: Principles and Applications , 2003 .

[18]  Esther de Ves,et al.  Applying logistic regression to relevance feedback in image retrieval systems , 2007, Pattern Recognit..

[19]  Jun Zhang,et al.  Local aggregation function learning based on support vector machines , 2009, Signal Process..

[20]  Longin Jan Latecki,et al.  Improving SVM classification on imbalanced time series data sets with ghost points , 2011, Knowledge and Information Systems.

[21]  Bart Thomee,et al.  Interactive search in image retrieval: a survey , 2012, International Journal of Multimedia Information Retrieval.

[22]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[23]  Francesc J. Ferri,et al.  An improved distance-based relevance feedback strategy for image retrieval , 2013, Image Vis. Comput..

[24]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[25]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[26]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.