论文信息 - Semi-Supervised Text Classification Using Positive and Unlabeled Data

Semi-Supervised Text Classification Using Positive and Unlabeled Data

Text classification using positive and unlabeled data refers to the problem of building text classifier using positive documents (P) of one class and unlabeled documents (U) of many other classes. U consists of positive and negative documents. Some existing methods for solving the PU-Learning problem are building a classifier in a two-step process. Generally speaking, these existing methods do not perform well when the size of P is too small. In this paper, we propose an improved method aiming at solving the PU-Learning problem with small P. This method combines the graph-based semi-supervised learning with the two-step method. Experiment indicates that our improved method performs well when the size of P is small.

Shuang Yu | Chunping Li | Xueyuan Zhou

[1] Philip S. Yu,et al. Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[2] Philip S. Yu,et al. Partially Supervised Classification of Text Documents , 2002, ICML.

[3] Bing Liu,et al. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[4] F. Denis. Classification and Co-training from Positive and Unlabeled Examples , 2003 .

[5] Bernhard Schölkopf,et al. Learning with Local and Global Consistency , 2003, NIPS.

[6] Xiaoli Li,et al. Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.