Hybrid local boosting utilizing unlabeled data in classification tasks

In many real life applications, a complete labeled data set is not always available. Therefore, an ideal learning algorithm should be able to learn from both labeled and unlabeled data. In this work a two stage local boosting algorithm for handling semi-supervised classification tasks is proposed. The proposed method can be simply described as: (a) a two stage local boosting method, (b) which adds self-labeled examples of unlabeled data and (c) employ them on semi-supervised classification tasks. Grounded on the local application of the boosting-by-reweighting version of AdaBoost, the proposed method utilizes unlabeled data to enhance it’s classification performance. Simulations on thirty synthetic and real-world benchmark data sets show that the proposed method significantly outperforms nine other well-known semi-supervised classification methods in terms of classification accuracy.

[1]  Chun-Xia Zhang,et al.  A local boosting algorithm for solving classification problems , 2008, Comput. Stat. Data Anal..

[2]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[3]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[4]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[7]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[8]  Pong C. Yuen,et al.  A Boosted Co-Training Algorithm for Human Action Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Sotiris B. Kotsiantis,et al.  Combining Prototype Selection with Local Boosting , 2016, AIAI.

[10]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[11]  Sotiris Kotsiantis,et al.  Combining random forest and support vector machines for semi-supervised learning , 2015, Panhellenic Conference on Informatics.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[14]  Fabio Roli,et al.  Analysis of Co-training Algorithm with Very Small Training Sets , 2012, SSPR/SPR.

[15]  Sotiris B. Kotsiantis,et al.  Local voting of weak classifiers , 2005, Int. J. Knowl. Based Intell. Eng. Syst..

[16]  Yuanqing Li,et al.  A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[17]  Hamideh Afsarmanesh,et al.  Semi-supervised self-training for decision tree classifiers , 2017, Int. J. Mach. Learn. Cybern..

[18]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[19]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[20]  Ujjwal Maulik,et al.  A self-trained ensemble with semisupervised SVM: An application to pixel classification of remote sensing imagery , 2011, Pattern Recognit..

[21]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[23]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[26]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[27]  Chao Deng,et al.  A new co-training-style random forest for computer aided diagnosis , 2011, Journal of Intelligent Information Systems.

[28]  Neamat El Gayar,et al.  Face recognition with semi-supervised learning and multiple classifiers , 2006 .

[29]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[30]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[31]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Sotiris B. Kotsiantis,et al.  Local Boosting of Decision Stumps for Regression and Classification Problems , 2006, J. Comput..

[34]  Hamideh Afsarmanesh,et al.  Disagreement-Based Co-training , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[35]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[36]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[37]  Haibo He,et al.  DCPE co-training for classification , 2012, Neurocomputing.

[38]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[39]  Siwei Luo,et al.  A random subspace method for co-training , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[40]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[43]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[44]  Tao Guo,et al.  Improved Tri-training with Unlabeled Data , 2012 .

[45]  Zbigniew Telec,et al.  Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms , 2012, Int. J. Appl. Math. Comput. Sci..

[46]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[47]  Friedhelm Schwenker,et al.  Co-Training by Committee: A Generalized Framework for Semi-Supervised Learning with Committees , 2008, Int. J. Softw. Informatics.

[48]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[49]  Wei Zhang,et al.  A Novel Semi-Supervised SVM Based on Tri-Training , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[50]  Zhi-Hua Zhou,et al.  SETRED: Self-training with Editing , 2005, PAKDD.

[51]  Minyoung Kim Discriminative semi-supervised learning of dynamical systems for motion estimation , 2011, Pattern Recognit..

[52]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[53]  Shiliang Sun,et al.  Robust Co-Training , 2011, Int. J. Pattern Recognit. Artif. Intell..

[54]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.