An Integration of CoTraining and Affinity Propagation for PU Text Classification

Under the framework of PU(Positive data and Unlabeled data), this paper originally proposes a three-setp algorithm. First, CoTraining is employed for filtering out the likely positive data from the unlabeled dataset U. Second, affinity propagation (AP) approach attempts to pick out the strong positive from likely positive set which is produced in first step. Those data picked out can be supplied to positive dataset P. Finally, a linear One-Class SVM will learn from both the purified U as negative and the expanded P as positive. Because of the algorithm's characteristic of automatic expanding positive dataset, the proposed algorithm especially performs well in situations where given positive dataset P is insufficient. A comprehensive experiment had proved that our algorithm is preferable to the existing ones.