Local Distribution in Neighborhood for Classification

The k-nearest-neighbor method performs classification tasks for a query sample based on the information contained in its neighborhood. Previous studies into the k-nearest-neighbor algorithm usually achieved the decision value for a class by combining the support of each sample in the neighborhood. They have generally considered the nearest neighbors separately, and potentially integral neighborhood information important for classification was lost, e.g. the distribution information. This article proposes a novel local learning method that organizes the information in the neighborhood through local distribution. In the proposed method, additional distribution information in the neighborhood is estimated and then organized; the classification decision is made based on maximum posterior probability which is estimated from the local distribution in the neighborhood. Additionally, based on the local distribution, we generate a generalized local classification form that can be effectively applied to various datasets through tuning the parameters. We use both synthetic and real datasets to evaluate the classification performance of the proposed method; the experimental results demonstrate the dimensional scalability, efficiency, effectiveness and robustness of the proposed method compared to some other state-of-the-art classifiers. The results indicate that the proposed method is effective and promising in a broad range of domains.

[1]  Yaxin Bi,et al.  An kNN Model-Based Approach and Its Application in Text Categorization , 2004, CICLing.

[2]  Nicu Sebe,et al.  Distance Learning for Similarity Estimation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[4]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[5]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[6]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bin Hu,et al.  Nearest Neighbor Method Based on Local Distribution for Classification , 2015, PAKDD.

[8]  James C. Bezdek,et al.  Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[9]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[10]  Ge Yu,et al.  A safe region based approach to moving KNN queries in obstructed space , 2015, Knowledge and Information Systems.

[11]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[13]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[14]  Ivo Düntsch,et al.  Nearest Neighbours without k , 2004, MSRAS.

[15]  Francisco Herrera,et al.  IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification , 2010, IEEE Transactions on Neural Networks.

[16]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[18]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[19]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[20]  Brian Everitt,et al.  Miscellaneous Clustering Methods , 2011 .

[21]  Ginés Rubio,et al.  New method for instance or prototype selection using mutual information in time series prediction , 2010, Neurocomputing.

[22]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[24]  David Zhang,et al.  On kernel difference-weighted k-nearest neighbor classification , 2008, Pattern Analysis and Applications.

[25]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[26]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Meng Wang,et al.  Visual Classification by ℓ1-Hypergraph Modeling , 2015, IEEE Trans. Knowl. Data Eng..

[28]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[29]  Ming-Syan Chen,et al.  On the Design and Applicability of Distance Functions in High-Dimensional Data Space , 2009, IEEE Trans. Knowl. Data Eng..

[30]  Francisco Herrera,et al.  Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..

[31]  Sounak Chakraborty,et al.  Bayesian adaptive nearest neighbor , 2010 .

[32]  Leon N. Cooper,et al.  Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence , 2006, Pattern Recognit..

[33]  B. Park,et al.  Choice of neighbor order in nearest-neighbor classification , 2008, 0810.5276.

[34]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[35]  Wai Lam,et al.  Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Inés María Galván,et al.  AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[38]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Yaxin Bi,et al.  Using kNN model for automatic text categorization , 2006, Soft Comput..

[40]  Bin Hu,et al.  Learning from neighborhood for classification with local distribution characteristics , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[41]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[42]  Sebastiano Impedovo,et al.  A novel prototype generation technique for handwriting digit recognition , 2014, Pattern Recognit..

[43]  Bidyut Baran Chaudhuri,et al.  A new definition of neighborhood of a point in multi-dimensional space , 1996, Pattern Recognit. Lett..

[44]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[45]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[46]  L. Bottou,et al.  1 Support Vector Machine Solvers , 2007 .

[47]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Shlomo Geva,et al.  Adaptive nearest neighbor pattern classification , 1991, IEEE Trans. Neural Networks.

[49]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..

[50]  Kai Keng Ang,et al.  A Brain-Computer Interface for classifying EEG correlates of chronic mental stress , 2011, The 2011 International Joint Conference on Neural Networks.

[51]  Wei Jiang,et al.  k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[52]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[53]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[54]  Francesco Ricci,et al.  Probability Based Metrics for Nearest Neighbor Classification and Case-Based Reasoning , 1999, ICCBR.

[55]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[56]  L. Liu,et al.  Improve Affective Learning with EEG Approach , 2010, Comput. Informatics.

[57]  Elisa Bertino,et al.  Secure kNN Query Processing in Untrusted Cloud Environments , 2014, IEEE Transactions on Knowledge and Data Engineering.

[58]  Bin Hu,et al.  Bayesian classification with local probabilistic model assumption in aiding medical diagnosis , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[59]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[60]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[61]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[62]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .

[63]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[64]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[65]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[66]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[67]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[68]  Padraig Cunningham,et al.  A Taxonomy of Similarity Mechanisms for Case-Based Reasoning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[69]  Hakan Altιnçay Improving the k‐nearest neighbour rule: using geometrical neighbourhoods and manifold‐based metrics , 2011 .

[70]  Ge Yu,et al.  Continuous visible k nearest neighbor query on moving objects , 2014, Inf. Syst..

[71]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[72]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[73]  Rabab Kreidieh Ward,et al.  A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  H. Kile,et al.  Bandwidth Selection in Kernel Density Estimation , 2010 .

[75]  Francisco Herrera,et al.  OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory , 2013, RSFDGrC.

[76]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[77]  Farid Melgani,et al.  Nearest Neighbor Classification of Remote Sensing Images With the Maximal Margin Principle , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[78]  Yoshihiko Hamamoto,et al.  A local mean-based nonparametric classifier , 2006, Pattern Recognit. Lett..

[79]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[80]  Cor J. Veenman,et al.  The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[82]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[83]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[84]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[85]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[86]  S. Salzberg A nearest hyperrectangle learning method , 2004, Machine Learning.

[87]  Francisco Herrera,et al.  FRPS: A Fuzzy Rough Prototype Selection method , 2013, Pattern Recognit..

[88]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[89]  Kar-Ann Toh,et al.  An empirical comparison of nine pattern classifiers , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[90]  Seiji Hotta,et al.  Pattern recognition using average patterns of categorical k-nearest neighbors , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[91]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[92]  Yi-Ching Liaw,et al.  Fast exact k nearest neighbors search using an orthogonal search tree , 2010, Pattern Recognit..

[93]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[94]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[95]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[96]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[97]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[98]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[99]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[100]  Jianping Gou,et al.  Improving K-Nearest Neighbor Rule with Dual Weighted Voting for Pattern Classification , 2011 .

[101]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[102]  James McNames,et al.  A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[104]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[105]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[106]  Shie-Jue Lee,et al.  A Similarity Measure for Text Classification and Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[107]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[108]  R. M. Chandrasekaran,et al.  Evaluation of k-Nearest Neighbor classifier performance for direct marketing , 2010, Expert Syst. Appl..

[109]  Jianping Gou,et al.  A Local Mean-Based k-Nearest Centroid Neighbor Classifier , 2012, Comput. J..

[110]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[111]  Yan Qiu Chen,et al.  The Nearest Neighbor Algorithm of Local Probability Centers , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[112]  Arbee L. P. Chen,et al.  Finding k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document} most favorite products based on reverse top , 2013, The VLDB Journal.

[113]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[114]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[115]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[116]  Thomas F. Krile,et al.  Calculation of Bayes' Recognition Error for Two Multivariate Gaussian Distributions , 1969, IEEE Transactions on Computers.

[117]  Bin Hu,et al.  EEG-based biometric identification using local probability centers , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).