An effective framework based on local cores for self-labeled semi-supervised classification

Abstract Semi-supervised self-labeled methods apply unlabeled data to improve the performance of classifiers which are trained by labeled data alone. Nevertheless, applying unlabeled data may deteriorate the prediction accuracy. One of the causes is that there are insufficient labeled data for training an initial classifier in self-labeled methods. However, existing solutions for this problem of lacking sufficient initial labeled data still have technical defects. For example, they fail to deal with non-spherical data and improve insufficient initial labeled data effectively, when initial labeled data are extremely scarce. In this paper, we propose an effective semi-supervised self-labeled framework based on local cores, aiming to solve the problem of lacking adequate initial labeled data in self-labeled methods and overcome existing technical defects above. Main ideas of our framework include two sides: (a) inadequate initial labeled data are improved by adding predicted local cores to them, where local cores are predicted by active labeling or co-labeling; (b) we use any semi-supervised self-labeled method to train a given classifier on improved labeled data and updated unlabeled data. In our framework, local cores roughly reveal the data distribution, which helps the proposed framework work on spherical or non-spherical data sets. In addition, local cores also help our framework improve insufficient initial labeled data effectively, even when initial labeled data are extremely scarce. Experiments show that the proposed framework is compatible with tested self-labeled methods, and can help self-labeled methods train a k nearest neighbor or support vector machine, when initial labeled data are insufficient.

[1]  Zhi-Hua Zhou,et al.  CoTrade: Confident Co-Training With Data Editing , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Jun Wang,et al.  Weighted samples based semi-supervised classification , 2019, Appl. Soft Comput..

[3]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[4]  Mark Swainson,et al.  Deep Bayesian Self-Training , 2018, Neural Computing and Applications.

[5]  Zhihua Wei,et al.  Semi-supervised multi-label image classification based on nearest neighbor editing , 2013, Neurocomputing.

[6]  Guoyin Wang,et al.  Self-training semi-supervised classification based on density peaks of data , 2018, Neurocomputing.

[7]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[8]  Sukree Sinthupinyo,et al.  Analysis of training data using clustering to improve semi-supervised self-training , 2017, Knowl. Based Syst..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Hans Ulrich Simon,et al.  Supervised learning and Co-training , 2014, Theor. Comput. Sci..

[11]  Haibo He,et al.  DCPE co-training for classification , 2012, Neurocomputing.

[12]  Wei Wu,et al.  Safety-aware Graph-based Semi-Supervised Learning , 2018, Expert Syst. Appl..

[13]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[14]  Yong Qi,et al.  A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Qingsheng Zhu,et al.  Adaptive edited natural neighbor algorithm , 2017, Neurocomputing.

[16]  Hamideh Afsarmanesh,et al.  Semi-supervised self-training for decision tree classifiers , 2017, Int. J. Mach. Learn. Cybern..

[17]  Francisco Herrera,et al.  On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification , 2014, Neurocomputing.

[18]  Shaobin Huang,et al.  Extractive summarization using supervised and unsupervised learning , 2019, Expert Syst. Appl..

[19]  Qingsheng Zhu,et al.  Natural neighborhood graph-based instance reduction algorithm without parameters , 2018, Appl. Soft Comput..

[20]  Michelangelo Ceci,et al.  Self-training for multi-target regression with tree ensembles , 2017, Knowl. Based Syst..

[21]  Wei Zhang,et al.  A P-ADMM for sparse quadratic kernel-free least squares semi-supervised support vector machine , 2018, Neurocomputing.

[22]  Qingsheng Zhu,et al.  Natural neighbor-based clustering algorithm with local representatives , 2017, Knowl. Based Syst..

[23]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[24]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[25]  Ping Zhong,et al.  Robust semi-supervised extreme learning machine , 2018, Knowl. Based Syst..

[26]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..

[27]  Hakan Gürkan,et al.  Effective semi-supervised learning strategies for automatic sentence segmentation , 2017, Pattern Recognit. Lett..

[28]  Nong Sang,et al.  Using clustering analysis to improve semi-supervised classification , 2013, Neurocomputing.

[29]  Zhongsheng Hua,et al.  Semi-supervised learning based on nearest neighbor rule and cut edges , 2010, Knowl. Based Syst..

[30]  Qingsheng Zhu,et al.  A local cores-based hierarchical clustering algorithm for data sets with complex structures , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[31]  Raviv Raich,et al.  Hinge loss bound approach for surrogate supervision multi-view learning , 2014, Pattern Recognit. Lett..

[32]  Marios Savvides,et al.  Semi self-training beard/moustache detection and segmentation simultaneously , 2017, Image Vis. Comput..

[33]  Shuang Wang,et al.  Improve the performance of co-training by committee with refinement of class probability estimations , 2014, Neurocomputing.

[34]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[35]  Ismail Uysal,et al.  GAR: An efficient and scalable Graph-based Activity Regularization for semi-supervised learning , 2017, Neurocomputing.

[36]  Francisco Martínez-Álvarez,et al.  A sensitivity study of seismicity indicators in supervised learning to improve earthquake prediction , 2016, Knowl. Based Syst..

[37]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[38]  Qingsheng Zhu,et al.  Semi-Supervised Self-Training Method Based on an Optimum-Path Forest , 2019, IEEE Access.

[39]  Yingle Fan,et al.  Confidence-weighted safe semi-supervised clustering , 2019, Eng. Appl. Artif. Intell..

[40]  Quanwang Wu,et al.  A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor , 2019, Knowl. Based Syst..

[41]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[42]  Di Wu,et al.  A Highly Accurate Framework for Self-Labeled Semisupervised Classification in Industrial Applications , 2018, IEEE Transactions on Industrial Informatics.

[43]  Ali Selamat,et al.  Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[44]  Chao Deng,et al.  Tri-training and Data Editing Based Semi-supervised Clustering Algorithm , 2006, MICAI.

[45]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[46]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[47]  Yaochu Jin,et al.  Multi-train: A semi-supervised heterogeneous ensemble classifier , 2017, Neurocomputing.

[48]  Francisco Herrera,et al.  SEG-SSC: A Framework Based on Synthetic Examples Generation for Self-Labeled Semi-Supervised Classification , 2015, IEEE Transactions on Cybernetics.