Semi-Supervised Self-Training Method Based on an Optimum-Path Forest

Semi-supervised self-training method can train an effective classifier by exploiting labeled and unlabeled samples. Recently, a self-training method based on density peaks of data (STDP) is proposed. However, it still suffers from some shortcomings to be addressed. For example, STDP is affected by cut-off distance <inline-formula> <tex-math notation="LaTeX">$d_{c}$ </tex-math></inline-formula>. As a result, it is tricky for STDP to select an optimal parameter <inline-formula> <tex-math notation="LaTeX">$d_{c}$ </tex-math></inline-formula> on each data set. Furthermore, STDP has a poor performance on data sets with some variations in density because of cut-off distance <inline-formula> <tex-math notation="LaTeX">$d_{c}$ </tex-math></inline-formula>. In order to solve these problems, we present a new self-training method which connects unlabeled and labeled samples as vertexes of an optimum path forest to discover the underlying structure of feature space. Furthermore, the underlying structure of the feature space is used to guide the self-training method to train a classifier. Compared with STDP, our algorithm is free of parameters and can work better on data sets with some variations in density. Moreover, we are surprised to find that our algorithm also has some advantages in dealing with overlapping data sets. The experimental results on real data sets clearly demonstrate that our algorithm has better performance than some previous works in improving the performance of base classifiers of k-nearest neighbor, support vector machine and cart.

[1]  Weiping Zhu,et al.  Spatial co-training for semi-supervised image classification , 2015, Pattern Recognit. Lett..

[2]  Gang Wang,et al.  SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media , 2017, Artif. Intell. Medicine.

[3]  Ismail Uysal,et al.  GAR: An efficient and scalable Graph-based Activity Regularization for semi-supervised learning , 2017, Neurocomputing.

[4]  João Paulo Papa,et al.  Improving semi-supervised learning through optimum connectivity , 2016, Pattern Recognit..

[5]  Shun-Ren Xia,et al.  Fractional‐order Darwinian PSO‐based feature selection for media‐adventitia border detection in intravascular ultrasound images , 2019, Ultrasonics.

[6]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[7]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[8]  Nong Sang,et al.  Using clustering analysis to improve semi-supervised classification , 2013, Neurocomputing.

[9]  Hongguang Sun,et al.  An improved optimum-path forest clustering algorithm for remote sensing image segmentation , 2018, Comput. Geosci..

[10]  Zhongsheng Hua,et al.  Semi-supervised learning based on nearest neighbor rule and cut edges , 2010, Knowl. Based Syst..

[11]  Lei Xi,et al.  Rough set and ensemble learning based semi-supervised algorithm for text classification , 2011, Expert Syst. Appl..

[12]  Zhi-Hua Zhou,et al.  SETRED: Self-training with Editing , 2005, PAKDD.

[13]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Wei Wu,et al.  Safety-aware Graph-based Semi-Supervised Learning , 2018, Expert Syst. Appl..

[15]  Parham Moradi,et al.  Dynamic graph-based label propagation for density peaks clustering , 2019, Expert Syst. Appl..

[16]  Hakan Gürkan,et al.  Effective semi-supervised learning strategies for automatic sentence segmentation , 2017, Pattern Recognit. Lett..

[17]  Shuang Wang,et al.  Improve the performance of co-training by committee with refinement of class probability estimations , 2014, Neurocomputing.

[18]  Marios Savvides,et al.  Semi self-training beard/moustache detection and segmentation simultaneously , 2017, Image Vis. Comput..

[19]  João Paulo Papa,et al.  A path- and label-cost propagation approach to speedup the training of the optimum-path forest classifier , 2014, Pattern Recognit. Lett..

[20]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..

[21]  Zhihua Wei,et al.  Semi-supervised multi-label image classification based on nearest neighbor editing , 2013, Neurocomputing.

[22]  Guoyin Wang,et al.  Self-training semi-supervised classification based on density peaks of data , 2018, Neurocomputing.

[23]  Yongli Wang,et al.  Revisiting transductive support vector machines with margin distribution embedding , 2018, Knowl. Based Syst..

[24]  Qingsheng Zhu,et al.  Natural neighborhood graph-based instance reduction algorithm without parameters , 2018, Appl. Soft Comput..

[25]  Yaochu Jin,et al.  Multi-train: A semi-supervised heterogeneous ensemble classifier , 2017, Neurocomputing.

[26]  Michelangelo Ceci,et al.  Self-training for multi-target regression with tree ensembles , 2017, Knowl. Based Syst..

[27]  Hong Wang,et al.  Shared-nearest-neighbor-based clustering by fast search and find of density peaks , 2018, Inf. Sci..

[28]  Qingsheng Zhu,et al.  Natural neighbor-based clustering algorithm with local representatives , 2017, Knowl. Based Syst..

[29]  Ebrahim Bagheri,et al.  Self-training on refined clause patterns for relation extraction , 2017, Inf. Process. Manag..

[30]  João Paulo Papa,et al.  Multi-label semi-supervised classification through optimum-path forest , 2018, Inf. Sci..

[31]  Hamideh Afsarmanesh,et al.  Semi-supervised self-training for decision tree classifiers , 2017, Int. J. Mach. Learn. Cybern..

[32]  Francisco Herrera,et al.  On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification , 2014, Neurocomputing.

[33]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[34]  Nikos Fazakis,et al.  Locally application of naive Bayes for self-training , 2017, Evol. Syst..

[35]  João Paulo Papa,et al.  A Discrete Approach for Supervised Pattern Recognition , 2008, IWCIA.

[36]  Ali Selamat,et al.  Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[37]  Hooshang H. Asadi,et al.  Application of semi-supervised fuzzy c-means method in clustering multivariate geochemical data, a case study from the Dalli Cu-Au porphyry deposit in central Iran , 2017 .

[38]  Mohamed Cheriet,et al.  Help-Training for semi-supervised support vector machines , 2011, Pattern Recognit..