Learning from neighborhood for classification with local distribution characteristics

The k-nearest neighbor method generates predictions for a particular instance from its neighborhood. It is a simple but effective supervised method for classification. However, the traditional k-nearest neighbor algorithm using the majority voting rule for the class label usually loses a part of useful information in the neighborhood. This paper tries to learn from the neighborhood for more useful information for classification and proposes an improved version of k-nearest neighbor method by heuristically organizing the local distribution characteristics. Different from the traditional methods, the proposed method considers the neighborhood of a query sample from the perspective of local distribution and learns from the neighborhood for local distribution characteristics for classification. We analyze the impact of local distribution characteristics on classification and heuristically develop a formulation to estimate the membership degree, which indicates the level of membership of a query sample to each class; then the query sample is classified to the class which has the highest membership degree with respect to the query sample. Experiments have been conducted on several real data sets; the results support the conclusion that the proposed method is superior to the traditional voting k-nearest neighbor method and comparable with or better than several state-of-the-art methods in terms of classification performance and robustness.

[1]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[4]  Seiji Hotta,et al.  Pattern recognition using average patterns of categorical k-nearest neighbors , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Vikram Pudi,et al.  Class Based Weighted K-Nearest Neighbor over Imbalance Dataset , 2013, PAKDD.

[6]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Carlotta Domeniconi,et al.  Nearest neighbor ensemble , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  Wei Liu,et al.  Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets , 2011, PAKDD.

[10]  Kar-Ann Toh,et al.  An empirical comparison of nine pattern classifiers , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Yan Qiu Chen,et al.  The Nearest Neighbor Algorithm of Local Probability Centers , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Apostolos N. Papadopoulos,et al.  Nearest Neighbor Search:: A Database Perspective , 2004 .

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  E. Sackinger,et al.  Neural-Network and k-Nearest-neighbor Classifiers , 1991 .

[16]  Eamonn J. Keogh,et al.  Ensembles of Nearest Neighbor Forecasts , 2006, ECML.

[17]  Shixiong Xia,et al.  An Improved KNN Text Classification Algorithm Based on Clustering , 2009, J. Comput..

[18]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[19]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[20]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[21]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[23]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.