THE INFLUENCE OF WEIGHTING THE K-OCCURRENCES ON HUBNESS-AWARE CLASSIFICATION METHODS

Hubness is a phenomenon present in many highdimensional data sets. It is related to the skewness in the distribution of k-occurrences, i.e. occurrences of data points in k-neighbor sets of other data points. Several hubness-aware methods that focus on exploiting this phenomenon have recently been proposed. In this paper, we examine the potential impact of weighting the koccurrences, by taking into account the distance between the respective data points, on hubness-aware nearestneighbor methods, more specifically hw-kNN, h-FNN and HIKNN. We show that such distance-based weighting can be both advantageous and detrimental and that it influences different methods in different ways.

[1]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[5]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[6]  Haibin Zhu,et al.  An Adaptive Fuzzy kNN Text Classifier , 2006, International Conference on Computational Science.

[7]  Alexandros Nanopoulos,et al.  Nearest neighbors in high-dimensional data: the emergence and influence of hubs , 2009, ICML '09.

[8]  Zhongfei Zhang,et al.  Multimedia Data Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[9]  Dunja Mladenic,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2011, International Journal of Machine Learning and Cybernetics.

[10]  Dunja Mladenic,et al.  A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN , 2011, CIKM '11.

[11]  Dunja Mladenic,et al.  Nearest neighbor voting in high dimensional data: Learning from past occurrences , 2012, Comput. Sci. Inf. Syst..

[12]  Roman Słowiński,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.