Using EM to Classify Text from Labeledand Unlabeled DocumentsKamal

This paper shows that the accuracy of learned text classiiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is signiicant because in many important text classiication problems obtaining classiication labels is expensive, while large quantities of unlabeled documents are readily available. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text, based on the combination of Expectation-Maximization with a naive Bayes classiier. The algorithm rst trains a classiier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classiier using the labels for all the documents, and iterates. Experimental results, obtained using text from three diierent real-world tasks, show that the use of unlabeled data reduces classiication error by up to 30%.

[1]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[2]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[3]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[6]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[7]  Yiming Yang,et al.  An application of least squares fit mapping to text information retrieval , 1993, SIGIR.

[8]  T. Cover,et al.  The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[9]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[12]  David D. Lewis,et al.  A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[13]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[14]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[15]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[16]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[17]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[19]  W. B. CroftCenter Combining Classiiers in Text Categorization , 1996 .

[20]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[21]  Thomas Kalt,et al.  A New Probabilistic Model of Text Classification and Retrieval , 1998 .

[22]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[23]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[24]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[25]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[26]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[27]  David D. Lewis,et al.  Threading Electronic Mail - A Preliminary Study , 1997, Inf. Process. Manag..

[28]  T. Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1999, ECML.

[29]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[30]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[31]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[32]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[33]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[34]  William W. Cohen,et al.  Context-sensitive learning methods for text categorization , 1999, TOIS.