论文信息 - Hybrid local boosting utilizing unlabeled data in classification tasks

Hybrid local boosting utilizing unlabeled data in classification tasks

In many real life applications, a complete labeled data set is not always available. Therefore, an ideal learning algorithm should be able to learn from both labeled and unlabeled data. In this work a two stage local boosting algorithm for handling semi-supervised classification tasks is proposed. The proposed method can be simply described as: (a) a two stage local boosting method, (b) which adds self-labeled examples of unlabeled data and (c) employ them on semi-supervised classification tasks. Grounded on the local application of the boosting-by-reweighting version of AdaBoost, the proposed method utilizes unlabeled data to enhance it’s classification performance. Simulations on thirty synthetic and real-world benchmark data sets show that the proposed method significantly outperforms nine other well-known semi-supervised classification methods in terms of classification accuracy.

Sotiris B. Kotsiantis | Michael N. Vrahatis | Christos K. Aridas | M. N. Vrahatis | S. Kotsiantis

[1] Chun-Xia Zhang,et al. A local boosting algorithm for solving classification problems , 2008, Comput. Stat. Data Anal..

[2] Rayid Ghani,et al. Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[3] Ellen Riloff,et al. Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[4] Jesús Alcalá-Fdez,et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6] María José del Jesús,et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[7] Avrim Blum,et al. Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[8] Pong C. Yuen,et al. A Boosted Co-Training Algorithm for Human Action Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Sotiris B. Kotsiantis,et al. Combining Prototype Selection with Local Boosting , 2016, AIAI.

[10] David A. Freedman,et al. Statistical Models: Theory and Practice: References , 2005 .

[11] Sotiris Kotsiantis,et al. Combining random forest and support vector machines for semi-supervised learning , 2015, Panhellenic Conference on Informatics.

[12] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[13] Alexander Zien,et al. Semi-Supervised Learning , 2006 .

[14] Fabio Roli,et al. Analysis of Co-training Algorithm with Very Small Training Sets , 2012, SSPR/SPR.

[15] Sotiris B. Kotsiantis,et al. Local voting of weak classifiers , 2005, Int. J. Knowl. Based Intell. Eng. Syst..

[16] Yuanqing Li,et al. A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[17] Hamideh Afsarmanesh,et al. Semi-supervised self-training for decision tree classifiers , 2017, Int. J. Mach. Learn. Cybern..

[18] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[19] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[20] Ujjwal Maulik,et al. A self-trained ensemble with semisupervised SVM: An application to pixel classification of remote sensing imagery , 2011, Pattern Recognit..

[21] Zhi-Hua Zhou,et al. Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[22] Martial Hebert,et al. Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[23] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[24] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[25] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[26] Steven L. Salzberg,et al. Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[27] Chao Deng,et al. A new co-training-style random forest for computer aided diagnosis , 2011, Journal of Intelligent Information Systems.

[28] Neamat El Gayar,et al. Face recognition with semi-supervised learning and multiple classifiers , 2006 .

[29] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[30] D. Angluin,et al. Learning From Noisy Examples , 1988, Machine Learning.

[31] Dana Angluin,et al. Learning from noisy examples , 1988, Machine Learning.

[32] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33] Sotiris B. Kotsiantis,et al. Local Boosting of Decision Stumps for Regression and Classification Problems , 2006, J. Comput..

[34] Hamideh Afsarmanesh,et al. Disagreement-Based Co-training , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[35] Jianjun Li. A two-step rejection procedure for testing multiple hypotheses , 2008 .

[36] Léon Bottou,et al. Local Learning Algorithms , 1992, Neural Computation.

[37] Haibo He,et al. DCPE co-training for classification , 2012, Neurocomputing.

[38] Pat Langley,et al. Induction of One-Level Decision Trees , 1992, ML.

[39] Siwei Luo,et al. A random subspace method for co-training , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[40] Francisco Herrera,et al. Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[42] Steven Salzberg,et al. Programs for Machine Learning , 2004 .

[43] Zehra Cataltepe,et al. Co-training with relevant random subspaces , 2010, Neurocomputing.

[44] Tao Guo,et al. Improved Tri-training with Unlabeled Data , 2012 .

[45] Zbigniew Telec,et al. Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms , 2012, Int. J. Appl. Math. Comput. Sci..

[46] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[47] Friedhelm Schwenker,et al. Co-Training by Committee: A Generalized Framework for Semi-Supervised Learning with Committees , 2008, Int. J. Softw. Informatics.

[48] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[49] Wei Zhang,et al. A Novel Semi-Supervised SVM Based on Tri-Training , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[50] Zhi-Hua Zhou,et al. SETRED: Self-training with Editing , 2005, PAKDD.

[51] Minyoung Kim. Discriminative semi-supervised learning of dynamical systems for motion estimation , 2011, Pattern Recognit..

[52] Zhi-Hua Zhou,et al. Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[53] Shiliang Sun,et al. Robust Co-Training , 2011, Int. J. Pattern Recognit. Artif. Intell..

[54] Francisco Herrera,et al. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.