On using Additional Unlabeled Data for Improving Dissimilarity-Based Classifications

This paper reports an experimental result obtained with additionally using unlabeled data together with labeled ones to improve the classification accuracy of dissimilarity-based methods, namely, dissimilarity-based classifications (DBC) (Pȩkalska, E. and Duin, R. P .W., 2005). In DBC, classifiers among classes are not based on the feature measurements of individual objects, but rather on a suitable dissimilarity measure among the objects. In image classification tasks, on the other hand, one of the intractable problems is the lack of information caused by the insufficient number of data. To address this problem in DBC, in this paper we study a new way of measuring the dissimilarity distance between two object images by using the well-known one-shot similarity metric (OSS) (Wolf, L. et al., 2009). In DBC using OSS, the dissimilarity distance is measured based on unlabeled (background) data that do not belong to the classes being learned, and consequently, do not require labeling. From this point of view, the classification is done in a semi-supervised learning (SSL) framework. Our experimental results, obtained with well-known benchmarks, demonstrate that when the cardinalities of the unlabeled data set and the prototype set have been appropriately chosen using additional unlabeled data for the OSS metric in SSL, DBC can be improved in terms of classification accuracies.

[1]  Mário A. T. Figueiredo,et al.  Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[2]  Robert P.W. Duin,et al.  PRTools3: A Matlab Toolbox for Pattern Recognition , 2000 .

[3]  Robert P. W. Duin,et al.  Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[5]  B. John Oommen,et al.  On using prototype reduction schemes to optimize dissimilarity-based classification , 2007, Pattern Recognit..

[6]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[7]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[9]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  José Salvador Sánchez,et al.  Prototype Selection in Imbalanced Data for Dissimilarity Representation - A Preliminary Study , 2012, ICPRAM.

[11]  Eugene Charniak,et al.  When is Self-Training Effective for Parsing? , 2008, COLING.

[12]  Robert P. W. Duin,et al.  A generalization of dissimilarity representations using feature lines and feature planes , 2009, Pattern Recognit. Lett..

[13]  Tal Hassner,et al.  Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Robert P. W. Duin,et al.  Non-Euclidean Problems in Pattern Recognition Related to Human Expert Knowledge , 2010, ICEIS.

[15]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.