论文信息 - DCPE co-training: Co-training based on diversity of class probability estimation

DCPE co-training: Co-training based on diversity of class probability estimation

Co-training is a semi-supervised learning technique used to recover the unlabeled data based on two base learners. The normal co-training approaches use the most confidently recovered unlabeled data to augment the training data. In this paper, we investigate the co-training approaches with a focus on the diversity issue and propose the diversity of class probability estimation (DCPE) co-training approach. The key idea of the DCPE co-training method is to use DCPE between two base learners to choose the recovered unlabeled data. The results are compared with classic co-training, tri-training and self training methods. Our experimental study based on the UCI benchmark data sets shows that the DCPE co-training is robust and efficient in the classification.

[1] Yan Zhou,et al. Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[2] Ian Witten,et al. Data Mining , 2000 .

[3] Zhi-Hua Zhou,et al. Analyzing Co-training Style Algorithms , 2007, ECML.

[4] Craig A. Knoblock,et al. Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[5] Yan Zhou,et al. Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[6] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.

[7] Qiang Yang,et al. Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[8] Zhi-Hua Zhou,et al. Semi-Supervised Regression with Co-Training Style Algorithms , 2007 .

[9] Larry D. Hostetler,et al. k-nearest-neighbor Bayes-risk estimation , 1975, IEEE Trans. Inf. Theory.

[10] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[11] Dragos D. Margineantu,et al. Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[12] Amir F. Atiya,et al. Estimating the Posterior Probabilities Using the K-Nearest Neighbor Rule , 2005, Neural Computation.

[13] Foster J. Provost,et al. Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[14] Tao Li,et al. Semisupervised learning from different information sources , 2005, Knowledge and Information Systems.

[15] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[16] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[17] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[19] Zhi-Hua Zhou,et al. Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20] Zhi-Hua Zhou,et al. Semisupervised Regression with Cotraining-Style Algorithms , 2007, IEEE Transactions on Knowledge and Data Engineering.