论文信息 - Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification

Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising high-dimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well as the standard kNN classifier.

Nenad Tomašev | M. Ivanović | D. Mladenic | Miloš Radovanović

[1] Lotfi A. Zadeh,et al. Fuzzy Sets , 1996, Inf. Control..

[2] James M. Keller,et al. A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[3] J. M. Salceda,et al. Fuzzy K-nearest neighbor classifiers for ventricular arrhythmia detection. , 1991, International journal of bio-medical computing.

[4] Charu C. Aggarwal,et al. On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[5] Paul Scheunders,et al. Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for hyperspectral satellite imagery , 2002, Pattern Recognit. Lett..

[6] N. Singpurwalla,et al. Membership Functions and Probability Measures of Fuzzy Sets , 2004 .

[7] François Pachet,et al. Improving Timbre Similarity : How high’s the sky ? , 2004 .

[8] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.

[9] Seung-Yeon Kim,et al. Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method , 2005, Bioinform..

[10] Tuan D. Pham,et al. An Optimally Weighted Fuzzy k-NN Algorithm , 2005, ICAPR.

[11] Kuo-Chen Chou,et al. Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[12] David Zhang,et al. On kernel difference-weighted k-nearest neighbor classification , 2008, Pattern Analysis and Applications.

[13] Michel Verleysen,et al. The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14] Shiow-Fen Hwang,et al. Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method , 2007, Biosyst..

[15] Alexandros Nanopoulos,et al. Nearest neighbors in high-dimensional data: the emergence and influence of hubs , 2009, ICML '09.

[16] Yousef Saad,et al. Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[17] Ata Kabán,et al. When is 'nearest neighbour' meaningful: A converse theorem and implications , 2009, J. Complex..

[18] P. Viswanath,et al. Rough-fuzzy weighted k-nearest leader classifier for large data sets , 2009, Pattern Recognit..

[19] Zhongfei Zhang,et al. Multimedia Data Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[20] Alexandros Nanopoulos,et al. Time-Series Classification in Many Intrinsic Dimensions , 2010, SDM.

[21] Alexandros Nanopoulos,et al. On the existence of obstinate results in vector space models , 2010, SIGIR.

[22] Kai Zheng,et al. K-nearest neighbor search for fuzzy objects , 2010, SIGMOD Conference.

[23] Hans-Peter Kriegel,et al. Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? , 2010, SSDBM.

[24] Alexandros Nanopoulos,et al. Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[25] Dunja Mladenic,et al. Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2011, International Journal of Machine Learning and Cybernetics.

[26] Yu-Lin He,et al. Particle swarm optimization for determining fuzzy measures from data , 2011, Inf. Sci..

[27] Lars Schmidt-Thieme,et al. INSIGHT: Efficient and Effective Instance Selection for Time-Series Classification , 2011, PAKDD.

[28] Dunja Mladenic,et al. The influence of hubness on nearest-neighbor methods in object recognition , 2011, 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing.

[29] Dunja Mladenic,et al. Nearest neighbor voting in high dimensional data: Learning from past occurrences , 2012, Comput. Sci. Inf. Syst..

[30] Dunja Mladenic,et al. The Role of Hubness in Clustering High-Dimensional Data , 2014, IEEE Trans. Knowl. Data Eng..