论文信息 - Predicting the number of nearest neighbors for the k-NN classification algorithm

Predicting the number of nearest neighbors for the k-NN classification algorithm

k-Nearest Neighbor k-NN is one of the most widely used classification algorithms. When classifying a new instance, k-NN first finds out its k nearest neighbors, and then classifies it by voting for the categories of the k nearest neighbors. Therefore, an appropriate number of nearest neighbors is critical for the k-NN classifier. However, in present, there is no systematical solution to determine the specific value of k. In order to address this problem, we propose a novel method of using back-propagation neural networks to explore the relationship between data set characteristics and the optimal values of k, then the relationship and the data set characteristics of a new data set are used to recommend the value of k for this data set. The experimental results on the 49 UCI benchmark data sets show that compared with the optimal k values, although there is a decrease of 1.61% in the average classification accuracy for the k-NN classifier with the recommended k values, the time for determining the k values is greatly shortened.

Qinbao Song | Xueying Zhang | Qinbao Song | Xueying Zhang

[1] Thomas G. Dietterich,et al. A study of distance-based machine learning algorithms , 1994 .

[2] João Gama,et al. Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[3] Ingoo Han,et al. Global optimization of feature weights and the number of neighbors that combine in a case‐based reasoning system , 2006, Expert Syst. J. Knowl. Eng..

[4] Dennis L. Wilson,et al. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[5] Venu Govindaraju,et al. Improved k-nearest neighbor classification , 2002, Pattern Recognit..

[6] Shengyi Jiang,et al. An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[7] João Gama,et al. Characterization of Classification Algorithms , 1995, EPIA.

[8] Nobuhiro Yugami,et al. Theoretical Analysis of the Nearest Neighbor Classifier in Noisy Domains , 1996, ICML.

[9] Etienne Barnard,et al. Data characteristics that determine classifier performance , 2006 .

[10] Jun Shao,et al. The Efficiency and Consistency of Approximations to the Jackknife Variance Estimators , 1989 .

[11] Antanas Verikas,et al. Feature selection with neural networks , 2002, Pattern Recognit. Lett..

[12] D. Tax,et al. The characterization of classification problems by classifier disagreements , 2004, ICPR 2004.

[13] Tin Kam Ho,et al. Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14] Ian Witten,et al. Data Mining , 2000 .

[15] Vipin Kumar,et al. Introduction to Data Mining, (First Edition) , 2005 .

[16] KATE A. SMITH,et al. Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks , 2001 .

[17] Dongmo Zhang,et al. A KNN-Based Learning Method for Biology Species Categorization , 2005, ICNC.

[18] E. Clarke,et al. Entropy and MDL discretization of continuous variables for Bayesian belief networks , 2000 .

[19] Susan Craw,et al. Self-optimising CBR retrieval , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[20] Thierry Denoeux,et al. A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[21] Kuo-Chen Chou,et al. Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[22] Tin Kam Ho,et al. On classifier domains of competence , 2004, ICPR 2004.

[23] J. R. Quinlan,et al. Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[24] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25] Seishi Okamoto,et al. An Average-Case Analysis of k-Nearest Neighbor Classifier , 1995, ICCBR.

[26] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[27] Mark A. Girolami,et al. An empirical analysis of the probabilistic K-nearest neighbour classifier , 2007, Pattern Recognit. Lett..

[28] Kate Smith-Miles,et al. On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[29] Hyunchul Ahn,et al. Using genetic algorithms to optimize nearest neighbors for data mining , 2008, Ann. Oper. Res..

[30] Zhi-Hua Zhou,et al. Improve Multi-Instance Neural Networks through Feature Selection , 2004, Neural Processing Letters.

[31] Vipin Kumar,et al. Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.