论文信息 - PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training

The rule-based bootstrapping introduced by Yarowsky, and its co-training variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of labels for k ≥ 2.

Sanjoy Dasgupta | David A. McAllester | Michael L. Littman | M. Littman | S. Dasgupta

[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[3] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[4] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[5] Yoram Singer,et al. Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[6] Rayid Ghani,et al. Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.