Efficient kNN Classification With Different Numbers of Nearest Neighbors

<inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> value (even though set by experts) to all test samples. Previous solutions assign different <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a kTree method to learn different optimal <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> values. In the test stage, the kTree fast outputs the optimal <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> value for each test sample, and then, the kNN classification can be conducted using the learned optimal <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different <inline-formula> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> values to different test samples. This paper further proposes an improvement version of kTree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k*Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k*Tree) are much more efficient than the compared methods in terms of classification tasks.

[1]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[2]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Shichao Zhang,et al.  Noisy data elimination using mutual k-nearest neighbor for classification mining , 2012, J. Syst. Softw..

[4]  Dinggang Shen,et al.  A Novel Multi-relation Regularization Method for Regression and Classification in AD Diagnosis , 2014, MICCAI.

[5]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[6]  Anand Bahety,et al.  Extension and Evaluation of ID 3 – Decision Tree Algorithm , 2009 .

[7]  Yang Song,et al.  IKNN: Informative K-Nearest Neighbor Pattern Classification , 2007, PKDD.

[8]  Xuelong Li,et al.  Effective Feature Extraction in High-Dimensional Space , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Chengqi Zhang,et al.  Cost-sensitive classification with inadequate labeled data , 2012, Inf. Syst..

[10]  Zi Huang,et al.  Sparse hashing for fast multimedia search , 2013, TOIS.

[11]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[12]  Motoaki Kawanabe,et al.  Clustering with the Fisher Score , 2002, NIPS.

[13]  Huijun Gao,et al.  Feature Combination and the kNN Framework in Object Classification , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Shichao Zhang,et al.  Clustering-based Missing Value Imputation for Data Preprocessing , 2006, 2006 4th IEEE International Conference on Industrial Informatics.

[16]  Xuelong Li,et al.  L1-Norm-Based 2DPCA , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Ling Shao,et al.  Feature Learning for Image Classification Via Multiobjective Genetic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Chengqi Zhang,et al.  Cost-Sensitive Imputing Missing Values with Ordering , 2007, AAAI.

[19]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[20]  Pascal Vincent,et al.  K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms , 2001, NIPS.

[21]  Chengqi Zhang,et al.  Semi-parametric optimization for missing data imputation , 2007, Applied Intelligence.

[22]  Xuelong Li,et al.  Ranking Graph Embedding for Learning to Rerank , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Shichao Zhang,et al.  Parimputation: From Imputation and Null-Imputation to Partially Imputation , 2008, IEEE Intell. Informatics Bull..

[24]  Mark A. Girolami,et al.  An empirical analysis of the probabilistic K-nearest neighbour classifier , 2007, Pattern Recognit. Lett..

[25]  Xuelong Li,et al.  Learning k for kNN Classification , 2017, ACM Trans. Intell. Syst. Technol..

[26]  Hui Li,et al.  Majority voting combination of multiple case-based reasoning for financial distress prediction , 2009, Expert Syst. Appl..

[27]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[28]  Xuelong Li,et al.  Learning Instance Correlation Functions for Multilabel Classification , 2017, IEEE Transactions on Cybernetics.

[29]  Ramakrishna Kakarala,et al.  Consensus of k-NNs for Robust Neighborhood Selection on Graph-Based Manifolds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Zi Huang,et al.  Self-taught dimensionality reduction on the high-dimensional small-sized data , 2013, Pattern Recognit..

[31]  Farid Melgani,et al.  Nearest Neighbor Classification of Remote Sensing Images With the Maximal Margin Principle , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Xinbo Gao,et al.  Robust Sparse Coding for Mobile Image Labeling on the Cloud , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Shiliang Sun,et al.  An adaptive k-nearest neighbor algorithm , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[34]  Wei Liang,et al.  Nonnegative correlation coding for image classification , 2015, Science China Information Sciences.

[35]  Chengjun Liu,et al.  A Novel Locally Linear KNN Method With Applications to Visual Recognition , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Xuelong Li,et al.  Single Image Super-Resolution With Multiscale Similarity Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[38]  Xiaofeng Zhu,et al.  A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis , 2014, NeuroImage.

[39]  Yang Yu,et al.  Ensembling local learners ThroughMultimodal perturbation , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[40]  Dinggang Shen,et al.  Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer's Disease Diagnosis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Roberto Todeschini,et al.  Assessing the validity of QSARs for ready biodegradability of chemicals: an applicability domain perspective. , 2014, Current computer-aided drug design.

[42]  Tristan Mary-Huard,et al.  Tailored Aggregation for Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Dayou Liu,et al.  Design of an Enhanced Fuzzy k-nearest Neighbor Classifier Based Computer Aided Diagnostic System for Thyroid Disease , 2012, Journal of Medical Systems.

[45]  Arkadiusz Wojna,et al.  RIONA: A Classifier Combining Rule Induction and k-NN Method with Automated Selection of Optimal Neighbourhood , 2002, ECML.

[46]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .

[47]  Dinggang Shen,et al.  Multi-modality Canonical Feature Selection for Alzheimer's Disease Diagnosis , 2014, MICCAI.

[48]  Leon N. Cooper,et al.  Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence , 2006, Pattern Recognit..

[49]  Shichao Zhang,et al.  kNN Algorithm with Data-Driven k Value , 2014, ADMA.

[50]  Jianping Gou,et al.  A Novel Weighted Voting for K-Nearest Neighbor Rule , 2011, J. Comput..

[51]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[52]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[54]  Upmanu Lall,et al.  A Nearest Neighbor Bootstrap For Resampling Hydrologic Time Series , 1996 .

[55]  Yue Liu,et al.  Efficient kNN Algorithm Based on Graph Sparse Reconstruction , 2014, ADMA.

[56]  Truong Q. Nguyen,et al.  An Adaptable $k$ -Nearest Neighbors Algorithm for MMSE Image Interpolation , 2009, IEEE Transactions on Image Processing.

[57]  Xuelong Li,et al.  Spatiochromatic Context Modeling for Color Saliency Analysis , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Yan Qiu Chen,et al.  The Nearest Neighbor Algorithm of Local Probability Centers , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[59]  Ling Shao,et al.  Targeting Accurate Object Extraction From an Image: A Comprehensive Study of Natural Image Matting , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[60]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[61]  G. Priya,et al.  EFFICIENT KNN CLASSIFICATION ALGORITHM FOR BIG DATA , 2017 .

[62]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[63]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[64]  Shengyi Jiang,et al.  An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[65]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[66]  Xuelong Li,et al.  A Unified Learning Framework for Single Image Super-Resolution , 2014, IEEE Transactions on Neural Networks and Learning Systems.