Ranking based algorithms for learning from positive and unlabeled examples

Many real-world classification applications fall into the problems of learning from positive (P) and unlabeled examples (U). Most of the algorithms proposed to the problems are based on two-step strategy: 1) identifying a set of reliable negative examples (RN) from U; 2) applying a standard classification algorithm to RN and P. Intuitively, the capacities of negative extracting methods (NEMs) in step 1 are critical since the classifiers used in step 2 can be very sensitive to the noise in RN. Unfortunately, most of the existing NEMs are based on the assumption that there are plenty of positive examples and cannot work when there is a paucity of positive examples. Furthermore, most studies did not try to extract positive examples from U. It is conceivable that a classifier trained on an enlarged P (by adding positive examples extracted from U to P) could have better performance. Therefore, we propose rank-based algorithms which extract both reliable positive and negative examples from U. We then use these examples to train the subsequent classifiers. The experimental results show that our proposed approaches can greatly enhance the effectiveness of follow-up classifiers, especially when the size of P is small.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.

[3]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[5]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[6]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[7]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[8]  Jeff A. Bilmes,et al.  Soft-Supervised Learning for Text Classification , 2008, EMNLP.

[9]  Bo Zhang,et al.  Extracting opinion sentence by combination of SVM and syntactic templates , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[10]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[11]  Kevin Chen-Chuan Chang,et al.  PEBL: positive example based learning for Web page classification using SVM , 2002, KDD.

[12]  Rie Kubota Ando Latent semantic space: iterative scaling improves precision of inter-document similarity measurement , 2000, SIGIR '00.

[13]  Wanli Zuo,et al.  Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples , 2009, J. Comput..

[14]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[15]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.