Text Classification from Positive and Unlabeled Data using Misclassified Data Correction

This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines (SVM). The results using Reuters documents showed that the method was comparable to the current state-of-the-art biasedSVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.

[1]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[2]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[3]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[4]  Walt Detmar Meurers,et al.  Detecting Errors in Discontinuous Structural Annotation , 2005, ACL.

[5]  Paolo Rosso,et al.  Using the Web as corpus for self-training text categorization , 2009, Information Retrieval.

[6]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[7]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[8]  Walt Detmar Meurers,et al.  On Detecting Errors in Dependency Treebanks , 2008 .

[9]  Eleazar Eskin,et al.  Detecting Errors within a Corpus using Anomaly Detection , 2000, ANLP.

[10]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[11]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[12]  C.-J. Lin,et al.  Active Learning and Experimental Design with SVMs , 2011, Active Learning and Experimental Design @ AISTATS.

[13]  Thorsten Joachims,et al.  SVM Light: Support Vector Machine , 2002 .

[14]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.