Nearest neighbor editing aided by unlabeled data

This paper proposes a novel method for nearest neighbor editing. Nearest neighbor editing aims to increase the classifier's generalization ability by removing noisy instances from the training set. Traditionally nearest neighbor editing edits (removes/retains) each instance by the voting of the instances in the training set (labeled instances). However, motivated by semi-supervised learning, we propose a novel editing methodology which edits each training instance by the voting of all the available instances (both labeled and unlabeled instances). We expect that the editing performance could be boosted by appropriately using unlabeled data. Our idea relies on the fact that in many applications, in addition to the training instances, many unlabeled instances are also available since they do not need human annotation effort. Three popular data editing methods, including edited nearest neighbor, repeated edited nearest neighbor and All k-NN are adopted to verify our idea. They are tested on a set of UCI data sets. Experimental results indicate that all the three editing methods can achieve improved performance with the aid of unlabeled data. Moreover, the improvement is more remarkable when the ratio of training data to unlabeled data is small.

[1]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[2]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[3]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[4]  Daoqiang Zhang,et al.  Semi-Supervised Dimensionality Reduction ∗ , 2007 .

[5]  David J. Hand,et al.  Experiments on the edited condensed nearest neighbor rule , 1978, Inf. Sci..

[6]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[7]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[8]  T. Wagner,et al.  Another Look at the Edited Nearest Neighbor Rule. , 1976 .

[9]  Xingquan Zhu,et al.  A lazy bagging approach to classification , 2008, Pattern Recognit..

[10]  Tao Qin,et al.  An active feedback framework for image retrieval , 2008, Pattern Recognit. Lett..

[11]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[12]  Aristidis Likas,et al.  Semi-supervised and active learning with the probabilistic RBF classifier , 2008, Neurocomputing.

[13]  Joshua D. Knowles,et al.  Semi-supervised feature selection via multiobjective optimization , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[14]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[15]  Kongqiao Wang,et al.  Active learning for image retrieval with Co-SVM , 2007, Pattern Recognit..

[16]  Feiping Nie,et al.  A unified framework for semi-supervised dimensionality reduction , 2008, Pattern Recognit..

[17]  Francesc J. Ferri,et al.  Small sample size effects in the use of editing techniques , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[18]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[19]  Sankar K. Pal,et al.  A connectionist model for selection of cases , 2001, Inf. Sci..

[20]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[21]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.

[22]  Chun-Chin Hsu,et al.  An association-based case reduction technique for case-based reasoning , 2008, Inf. Sci..