DCPE co-training: Co-training based on diversity of class probability estimation

Co-training is a semi-supervised learning technique used to recover the unlabeled data based on two base learners. The normal co-training approaches use the most confidently recovered unlabeled data to augment the training data. In this paper, we investigate the co-training approaches with a focus on the diversity issue and propose the diversity of class probability estimation (DCPE) co-training approach. The key idea of the DCPE co-training method is to use DCPE between two base learners to choose the recovered unlabeled data. The results are compared with classic co-training, tri-training and self training methods. Our experimental study based on the UCI benchmark data sets shows that the DCPE co-training is robust and efficient in the classification.

[1]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[4]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[5]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[6]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[7]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[8]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training Style Algorithms , 2007 .

[9]  Larry D. Hostetler,et al.  k-nearest-neighbor Bayes-risk estimation , 1975, IEEE Trans. Inf. Theory.

[10]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[11]  Dragos D. Margineantu,et al.  Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[12]  Amir F. Atiya,et al.  Estimating the Posterior Probabilities Using the K-Nearest Neighbor Rule , 2005, Neural Computation.

[13]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[14]  Tao Li,et al.  Semisupervised learning from different information sources , 2005, Knowledge and Information Systems.

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Zhi-Hua Zhou,et al.  Semisupervised Regression with Cotraining-Style Algorithms , 2007, IEEE Transactions on Knowledge and Data Engineering.