Divide-and-conquer ensemble self-training method based on probability difference

Self-training method can train an effective classifier by exploiting labeled instances and unlabeled instances. In the process of self-training method, the high confidence instances are usually selected iteratively and added to the training set for learning. Unfortunately, the structure information of high confidence instances is so similar that it leads to local over-fitting during the iterations. In order to avoid the over-fitting phenomenon, and improve the classification effect of self-training methods, a novel divide-and-conquer ensemble self-training framework based on probability difference is proposed. Firstly, the probability difference of instances is calculated by the category probability of each classifier, the low-fuzzy and high-fuzzy instances of each classifier are divided through the probability difference. Then, a divide-and-conquer strategy is adopted. That is, the low-fuzzy instances determined by all the classifiers are directly labeled and high-fuzzy instances are manually labeled. Finally, the labeled instances are added to the training set for iteration self-training. This method expands the training set by selecting low-fuzzy instances with accurate structure information and high-fuzzy instances with more comprehensive structure information, and it improves the generalization performance of the method effectively. The method is more suitable for noise data sets and it can obtain structure information even in a few labeled instances. The effectiveness of the proposed method is verified by comparative experiments on the University of California Irvine (UCI).

[1]  Zhou Huang,et al.  A hybrid ensemble learning method for tourist route recommendations based on geo-tagged social networks , 2018, Int. J. Geogr. Inf. Sci..

[2]  Qing Yang,et al.  A support vector machine based naive Bayes algorithm for spam filtering , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[3]  Xing Chen,et al.  Semi-supervised learning for potential human microRNA-disease associations inference , 2014, Scientific Reports.

[4]  Jayakumar Sadhasivam,et al.  An empirical comparison of supervised learning algorithms and hybrid WDBN algorithm for MOOC courses , 2019 .

[5]  Quanwang Wu,et al.  A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor , 2019, Knowl. Based Syst..

[6]  Sukree Sinthupinyo,et al.  Analysis of training data using clustering to improve semi-supervised self-training , 2017, Knowl. Based Syst..

[7]  Peng Xu,et al.  Self-training-based spectral image reconstruction for art paintings with multispectral imaging. , 2017, Applied optics.

[8]  Mohamed Amine Fakhfakh,et al.  Bayesian curved lane estimation for autonomous driving , 2020, J. Ambient Intell. Humaniz. Comput..

[9]  Daniela Micucci,et al.  Falls as anomalies? An experimental evaluation using smartphone accelerometer data , 2015, J. Ambient Intell. Humaniz. Comput..

[10]  Changgeng Li,et al.  An Improved Weighted K-Nearest Neighbor Algorithm for Indoor Positioning , 2017, Wirel. Pers. Commun..

[11]  Jun Liu,et al.  Focused random walk with probability distribution for SAT with long clauses , 2020, Applied Intelligence.

[12]  Shuang Wang,et al.  Improve the performance of co-training by committee with refinement of class probability estimations , 2014, Neurocomputing.

[13]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[14]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.

[15]  Yong Qi,et al.  A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Ngoc Thanh Nguyen,et al.  A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields , 2017, Knowl. Based Syst..

[17]  Hamideh Afsarmanesh,et al.  Semi-supervised self-training for decision tree classifiers , 2017, Int. J. Mach. Learn. Cybern..

[18]  Zhi-Hua Zhou,et al.  Cost-Effective Active Learning from Diverse Labelers , 2017, IJCAI.

[19]  Martin Fischer,et al.  Learning movement patterns of the occupant in smart home environments: an unsupervised learning approach , 2017, J. Ambient Intell. Humaniz. Comput..

[20]  Zhi-Hua Zhou,et al.  Machine learning challenges and impact: an interview with Thomas Dietterich , 2017 .

[21]  Qingsheng Zhu,et al.  A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors , 2020, Applied Intelligence.

[22]  Ping Hou,et al.  An ensemble self-training protein interaction article classifier. , 2014, Bio-medical materials and engineering.

[23]  Qingsheng Zhu,et al.  Semi-Supervised Self-Training Method Based on an Optimum-Path Forest , 2019, IEEE Access.

[24]  Xizhao Wang,et al.  Fuzziness based sample categorization for classifier performance improvement , 2015, J. Intell. Fuzzy Syst..

[25]  Ning Zhang,et al.  Multi-Agent-Based Unsupervised Detection of Energy Consumption Anomalies on Smart Campus , 2019, IEEE Access.

[26]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[27]  Xingshe Zhou,et al.  Towards non-intrusive sleep pattern recognition in elder assistive environment , 2010, UIC.

[28]  Shuzhi Sam Ge,et al.  Small traffic sign detection from large image , 2019, Applied Intelligence.

[29]  Chengqi Zhang,et al.  Self-adaptive attribute weighting for Naive Bayes classification , 2015, Expert Syst. Appl..

[30]  Shuang Wang,et al.  Semi-supervised Learning Based on Improved Co-training by Committee , 2015, IScIDE.

[31]  Xu Chen,et al.  Combining Active Learning and Semi-Supervised Learning by Using Selective Label Spreading , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).