Robust Co-Training

Co-training is a multiview semi-supervised learning algorithm to learn from both labeled and unlabeled data, which iteratively adopts a classifier trained on one view to teach the other view using some confident predictions given on unlabeled examples. However, as it does not examine the reliability of the labels provided by classifiers on either view, co-training might be problematic. Even very few inaccurately labeled examples can deteriorate the performance of learned classifiers to a large extent. In this paper, a new method named robust co-training is proposed, which integrates canonical correlation analysis (CCA) to inspect the predictions of co-training on those unlabeled training examples. CCA is applied to obtain a low-dimensional and closely correlated representation of the original multiview data. Based on this representation the similarities between an unlabeled example and the original labeled examples are determined. Only those examples whose predicted labels are consistent with the outcome of CCA examination are eligible to augment the original labeled data. The performance of robust co-training is evaluated on several different classification problems where encouraging experimental results are observed.

[1]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[2]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[3]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[4]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[5]  Shiliang Sun,et al.  View Construction for Multi-view Semi-supervised Learning , 2011, ISNN.

[6]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[7]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[8]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[9]  Vikas Sindhwani,et al.  An RKHS for multi-view learning and manifold co-regularization , 2008, ICML '08.

[10]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[11]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[12]  Nicholas Kushmerick,et al.  Learning to remove Internet advertisements , 1999, AGENTS '99.

[13]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[14]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[15]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.