A Robust Instance Weighting Technique for Nearest Neighbor Classification in Noisy Environments

The performance of Nearest Neighbor (NN) classifier is highly dependent on the distance (or similarity) function used to find the NN of an input test pattern. Many of the proposed algorithms try to optimize the accuracy of the NN rule using a weighted distance function. In this scheme, a weight parameter is learned for each of the training instances. The weights of training instances are used in the generalization phase to find the NN of an input test pattern. The Weighted Distance Nearest Neighbor (WDNN) algorithm attempts to maximize the leave-one-out classification rate of the training set by adjusting the weight parameters. The procedure simply leads to weights that overfit the train data, which degrades the performance of the method especially in noisy environments. In this paper, we propose an enhanced version of WDNN, called Overfit Avoidance for WDNN (OAWDNN), that significantly outperforms the algorithm in generalization phase. The proposed method uses an early stopping approach to decrease instance weights specified by WDNN, which implicitly makes the class boundary smooth and consequently more generalized. In order to evaluate robustness of the algorithm, class label noise is added to a variety of UCI datasets. The experimental results show the supremacy of the proposed method in generalization accuracy.

[1]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[2]  M. Zolghadri Jahromi,et al.  An online rule weighting method to classify data streams , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[3]  Robert Ivor John,et al.  A method of learning weighted similarity function to improve the performance of nearest neighbor , 2009, Inf. Sci..

[4]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[5]  Enrique Vidal,et al.  Learning prototypes and distances (LPD). A prototype reduction technique based on nearest neighbor error minimization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Mohammad Hadi Sadreddini,et al.  A COST SENSITIVE LEARNING METHOD TO TUNE THE NEAREST NEIGHBOUR FOR INTRUSION DETECTION , 2012 .

[7]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Mansoor Zolghadri Jahromi,et al.  A Novel Weight Adjustment Method for Handling Concept-Drift in Data Stream Classification , 2014 .

[9]  Nada Lavrac,et al.  Conditions for Occam's Razor Applicability and Noise Elimination , 1997, ECML.

[10]  Mohammad Hadi Sadreddini,et al.  Distance measure adaptation based on local feature weighting , 2012, 2012 6th IEEE International Conference Intelligent Systems.

[11]  Jing Peng,et al.  Adaptive quasiconformal kernel nearest neighbor classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Tony R. Martinez,et al.  An Integrated Instance‐Based Learning Algorithm , 2000, Comput. Intell..

[13]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[14]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[15]  M. H. Sadreddini,et al.  An adaptive nearest neighbor classifier for noisy environments , 2010, 2010 18th Iranian Conference on Electrical Engineering.

[16]  M. Pardo,et al.  Learning from data: a tutorial with emphasis on modern pattern recognition methods , 2002 .

[17]  Elham Parvinnia,et al.  Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm , 2014, J. King Saud Univ. Comput. Inf. Sci..

[18]  Leon N. Cooper,et al.  Improving nearest neighbor rule with a simple adaptive distance measure , 2007, Pattern Recognit. Lett..

[19]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[20]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Philippe Lauret,et al.  Bayesian neural network approach to short time load forecasting , 2008 .

[22]  Hao Wu,et al.  Does overfitting affect performance in estimation of distribution algorithms , 2006, GECCO.