Efficient kNN Algorithm Based on Graph Sparse Reconstruction

This paper proposes an efficient k Nearest Neighbors (kNN) method based on a graph sparse reconstruction framework, called Graph Sparse kNN (GS-kNN for short) algorithm. We first design a reconstruction process between training and test samples to obtain the k value of kNN algorithm for each test sample. We then apply the varied kNN algorithm (i.e., GS-kNN) for the learning tasks, such as classification, regression, and missing value imputation. In the reconstruction process, we employ a least square loss function for achieving the minimal reconstruction error, use an l1-norm to generate different k values of kNN algorithm for different test samples, design an l21-norm to generate the row sparsity for the removal of the impact of noisy training samples, and utilize Locality Preserving Projection (LPP) to preserve the local structures of data. With such an objective function, the GS-kNN obtains the correlation between each test sample and training samples, which then is used to design new classification/regression/missing value imputation rules for real applications. Finally, the proposed GS-kNN method is evaluated with extensive experiments, including classification, regression and missing value imputation, on real datasets, and the experimental results show that the proposed GS-kNN algorithm outperforms the previous kNN algorithms in terms of classification accuracy, correlation coefficient and root mean square error (RMSE).

[1]  Zi Huang,et al.  Sparse hashing for fast multimedia search , 2013, TOIS.

[2]  Zi Huang,et al.  Self-taught dimensionality reduction on the high-dimensional small-sized data , 2013, Pattern Recognit..

[3]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[4]  Xindong Wu,et al.  Efficient mining of both positive and negative association rules , 2004, TOIS.

[5]  Chengqi Zhang,et al.  Semi-parametric optimization for missing data imputation , 2007, Applied Intelligence.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Shichao Zhang,et al.  The Journal of Systems and Software , 2012 .

[8]  Naftali Tishby,et al.  Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity , 2005, NIPS.

[9]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Yanchang Zhao,et al.  Generalized dimension-reduction framework for recent-biased time series analysis , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Shizhao Zhang,et al.  K NN-CF Approach: Incorporating Certainty Factor to k NN Classification. , 2010 .

[13]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Hans-Paul Schwefel,et al.  Advances in Computational Intelligence , 2003, Natural Computing Series.

[15]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[16]  Yang Song,et al.  IKNN: Informative K-Nearest Neighbor Pattern Classification , 2007, PKDD.

[17]  Walter L. Ruzzo,et al.  A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data , 2006, BMC Bioinformatics.

[18]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[19]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[20]  Mathieu Serrurier,et al.  Possibilistic KNN Regression Using Tolerance Intervals , 2012, IPMU.

[21]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[22]  Xiaofeng Zhu,et al.  A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis , 2014, NeuroImage.

[23]  X LingCharles,et al.  Missing Is Useful , 2005 .

[24]  Shichao Zhang,et al.  Cost-sensitive classification with respect to waiting cost , 2010, Knowl. Based Syst..

[25]  Shichao Zhang,et al.  Decision tree classifiers sensitive to heterogeneous costs , 2012, J. Syst. Softw..

[26]  Chengqi Zhang,et al.  Post-mining: maintenance of association rules by weighting , 2003, Inf. Syst..

[27]  Zi Huang,et al.  Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis , 2012, Pattern Recognition.

[28]  Dinggang Shen,et al.  Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer's Disease Diagnosis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..

[30]  Xiaofeng Zhu,et al.  Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..

[31]  Shichao Zhang,et al.  Estimating Semi-Parametric Missing Values with Iterative Imputation , 2010, Int. J. Data Warehous. Min..

[32]  Xindong Wu,et al.  Synthesizing High-Frequency Rules from Different Data Sources , 2003, IEEE Trans. Knowl. Data Eng..

[33]  Tristan Mary-Huard,et al.  Tailored Aggregation for Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Xiaofeng Zhu,et al.  Video-to-Shot Tag Propagation by Graph Sparse Group Lasso , 2013, IEEE Transactions on Multimedia.

[35]  Jun Shao,et al.  Jackknife Variance Estimation for Nearest-Neighbor Imputation , 2001 .