Using Dominant Sets for k-NN Prototype Selection

k-Nearest Neighbors is surely one of the most important and widely adopted non-parametric classification methods in pattern recognition. It has evolved in several aspects in the last 50 years, and one of the most known variants consists in the usage of prototypes: a prototype distills a group of similar training points, diminishing drastically the number of comparisons needed for the classification; actually, prototypes are employed in the case the cardinality of the training data is high. In this paper, by using the dominant set clustering framework, we propose four novel strategies for the prototype generation, allowing to produce representative prototypes that mirror the underlying class structure in an expressive and effective way. Our strategy boosts the k-NN classification performance; considering heterogeneous metrics and analyzing 15 diverse datasets, we are among the best 6 prototype-based k-NN approaches, with a computational cost which is strongly inferior to all the competitors. In addition, we show that our proposal beats linear SVM in the case of a pedestrian detection scenario.

[1]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007 .

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Eibe Frank,et al.  An Empirical Comparison of Exact Nearest Neighbour Algorithms , 2007, PKDD.

[7]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[8]  Marcello Pelillo,et al.  A simple feature combination method based on dominant sets , 2013, Pattern Recognit..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[11]  Marcello Pelillo,et al.  Dominant sets and hierarchical clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[13]  Samuel Rota Bulò,et al.  Infection and immunization: A new class of evolutionary game dynamics , 2011, Games Econ. Behav..

[14]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).