Semi-supervised learning based on nearest neighbor rule and cut edges

In this paper, we propose a novel semi-supervised learning approach based on nearest neighbor rule and cut edges. In the first step of our approach, a relative neighborhood graph based on all training samples is constructed for each unlabeled sample, and the unlabeled samples whose edges are all connected to training samples from the same class are labeled. These newly labeled samples are then added into the training samples. In the second step, standard self-training algorithm using nearest neighbor rule is applied for classification until a predetermined stopping criterion is met. In the third step, a statistical test is applied for label modification, and in the last step, the remaining unlabeled samples are classified using standard nearest neighbor rule. The main advantages of the proposed method are: (1) it reduces the error reinforcement by using relative neighborhood graph for classification in the initial stages of semi-supervised learning; (2) it introduces a label modification mechanism for better classification performance. Experimental results show the effectiveness of the proposed approach.

[1]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[2]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[3]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[4]  Yuanqing Li,et al.  A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Zhi-Hua Zhou,et al.  SETRED: Self-training with Editing , 2005, PAKDD.

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Rosie Jones,et al.  Learning to Extract Entities from Labeled and Unlabeled Text , 2005 .

[9]  Fabrice Muhlenbach,et al.  Identifying and Handling Mislabelled Instances , 2004, Journal of Intelligent Information Systems.

[10]  Naonori Ueda,et al.  A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design , 2005, AAAI.

[11]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[12]  Chang-Hwan Lee Improving classification performance using unlabeled data: Naive Bayesian case , 2007, Knowl. Based Syst..

[13]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models and Bayesian Networks , 2003 .

[14]  Bogdan Gabrys,et al.  Combining labelled and unlabelled data in the design of pattern classification systems , 2004, Int. J. Approx. Reason..

[15]  Fei Wang,et al.  Robust self-tuning semi-supervised learning , 2007, Neurocomputing.

[16]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Nitesh V. Chawla,et al.  Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains , 2011, J. Artif. Intell. Res..

[19]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[20]  Sungzoon Cho,et al.  Locally linear reconstruction for instance-based learning , 2008, Pattern Recognit..

[21]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[22]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[23]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Tijl De Bie,et al.  Semi-Supervised Learning Using Semi-Definite Programming , 2006, Semi-Supervised Learning.

[25]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.