论文信息 - Redefining nearest neighbor classification in high-dimensional settings - 字舞流文

Redefining nearest neighbor classification in high-dimensional settings

Abstract In this work, a novel nearest neighbor approach is presented. The main idea is to redefine the distance metric in order to include only a subset of relevant variables, assuming that they are of equal importance for the classification model. Three different distance measures are redefined: the traditional squared Euclidean, the Manhattan, and the Chebyshev. These modifications are designed to improve classification performance in high-dimensional applications, in which the concept of distance becomes blurry, i.e., all training points become uniformly distant from each other. Additionally, the inclusion of noisy variables leads to a loss of predictive performance if the main patterns are contained in just a few variables, since they are equally weighted. Experimental results on low- and high-dimensional datasets demonstrate the importance of these modifications, leading to superior average performance in terms of Area Under the Curve (AUC) compared with the traditional k nearest neighbor approach.

Julio López | Sebastián Maldonado | Julio López | S. Maldonado

[1] S. Salzberg,et al. A weighted nearest neighbor algorithm for learning with symbolic features , 2004, Machine Learning.

[2] Ahmed Bouridane,et al. Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier , 2007, Pattern Recognit. Lett..

[3] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4] Seoung Bum Kim,et al. Sequential random k-nearest neighbor feature selection for high-dimensional data , 2015, Expert Syst. Appl..

[5] Anil K. Ghosh,et al. High dimensional nearest neighbor classification based on mean absolute differences of inter-point distances , 2016, Pattern Recognit. Lett..

[6] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[7] Gang Li,et al. Improving the speed and stability of the k-nearest neighbors method , 2012, Pattern Recognit. Lett..

[8] Masoud Nikravesh,et al. Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[9] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10] David H. Wolpert,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[11] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[12] Punam Mulak,et al. Analysis of Distance Measures Using K-Nearest Neighbor Algorithm on KDD Dataset , 2015 .

[13] Einly Lim,et al. ANO Detection with K-Nearest Neighbor Using Minkowski Distance , 2013, SiPS 2013.

[14] Naftali Tishby,et al. Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity , 2005, NIPS.

[15] Donald A. Adjeroh,et al. Random KNN feature selection - a fast and stable alternative to Random Forests , 2011, BMC Bioinformatics.

[16] Leon N. Cooper,et al. Improving nearest neighbor rule with a simple adaptive distance measure , 2007, Pattern Recognit. Lett..

[17] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[18] R. Marimont,et al. Nearest Neighbour Searches and the Curse of Dimensionality , 1979 .

[19] Robert Tibshirani,et al. Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[20] Swagatam Das,et al. A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features , 2016, Pattern Recognit. Lett..

[21] Euntai Kim,et al. An efficient design of a nearest neighbor classifier for various-scale problems , 2010, Pattern Recognit. Lett..

[22] Julio López,et al. Group-penalized feature selection and robust twin SVM classification via second-order cone programming , 2017, Neurocomputing.

[23] Rong Li,et al. A new extracting algorithm of k nearest neighbors searching for point clouds , 2014, Pattern Recognit. Lett..

[24] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[25] Pablo A. Estévez,et al. A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.