Ensemble based co-training

Recently Semi-Supervised learning algorithms such as co-training are used in many application domains. In co-training, two classifiers based on different views of data or on different learning algorithms are trained in parallel and then unlabeled data that are classified differently by the classifiers but for which one classifier has large confidence are labeled and used as training data for the other. In this paper, a new form of co-training, called Ensemble-Co-Training, is proposed that uses an ensemble of different learning algorithms. Based on a theorem by Angluin and Laird that produces an approximately correct identification with high probability for reliable examples, we propose a criterion for finding a subset of high-confidence predictions and error rate for a classifier in each iteration of the training process. Experiments show that the new method in almost all domains gives better results than the other methods.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  Hamideh Afsarmanesh,et al.  Ensemble-training: ensemble based co-training , 2011 .

[3]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[4]  Zhi-Hua Zhou,et al.  Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[7]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[8]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[9]  Geoffrey I. Webb Decision Tree Grafting From the All Tests But One Partition , 1999, IJCAI.

[10]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[12]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.