The Value of Agreement, a New Boosting Algorithm

We present a new generalization bound where the use of unlabeled examples results in a better ratio between training-set size and the resulting classifier's quality and thus reduce the number of labeled examples necessary for achieving it. This is achieved by demanding from the algorithms generating the classifiers to agree on the unlabeled examples. The extent of this improvement depends on the diversity of the learners—a more diverse group of learners will result in a larger improvement whereas using two copies of a single algorithm gives no advantage at all. As a proof of concept, we apply the algorithm, named AgreementBoost, to a web classification problem where an up to 40% reduction in the number of labeled examples is obtained.

[1]  Joe F. Zhou,et al.  Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, : 21-22 June 1999, University of Maryland, College Park, MD, USA , 1999 .

[2]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[3]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[4]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[7]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[8]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[9]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[12]  Byoung-Tak Zhang,et al.  Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information , 2004, Inf. Process. Manag..

[13]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[15]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[16]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[19]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[20]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[22]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[23]  Anoop Sarkar,et al.  Corrected Co-training for Statistical Parsers , 2003 .

[24]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.