Combining labeled and unlabeled data for text classification with a large number of categories

We develop a framework to incorporate unlabeled data in the error-correcting output coding (ECOC) setup by decomposing multiclass problems into multiple binary problems and then use co-training to learn the individual binary classification problems. We show that our method is especially useful for classification tasks involving a large number of categories where co-training doesn't perform very well by itself and when combined with ECOC, outperforms several other algorithms that combine labeled and unlabeled data for text classification in terms of accuracy, precision-recall tradeoff, and efficiency.

[1]  Tsau Young Lin,et al.  Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[2]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[3]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[4]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[5]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Rayid Ghani,et al.  Using Error-Correcting Codes for Efficient Text Cla ssification with a Large Number of Categories , 2001 .

[8]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[9]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[12]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[13]  Yiming Yang,et al.  Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.