论文信息 - A Novel K-Means Clustering Algorithm Based on Positive Examples and Careful Seeding

A Novel K-Means Clustering Algorithm Based on Positive Examples and Careful Seeding

Positive and unlabeled learning (PU Learning) is a special semi-supervise learning method. Its most important feature is that training set only includes two parts: positive examples and unlabeled examples. Many real-world classification applications appeal to PU Learning problem. The K-means++ clustering algorithm proposed a new seeding method. This paper describes a semi-supervised learning algorithm for positive and unlabeled examples (PU learning). Our approach extends K-means++, an enhancement to K-means that seeds the algorithm with suitably chosen cluster centers, to such situations. The experiments on the Spam and 20-newsgroup data sets shown that our proposed algorithm has better performances.

[1] Philip S. Yu,et al. Partially Supervised Classification of Text Documents , 2002, ICML.

[2] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[3] Keith L. Clark,et al. An Experimental Study of Feature Selection Methods for Text Classification , 2008, Personalization Techniques and Recommender Systems.

[4] Witold Pedrycz,et al. Algorithms of fuzzy clustering with partial supervision , 1985, Pattern Recognit. Lett..

[5] Minyi Guo,et al. A class-feature-centroid classifier for text categorization , 2009, WWW '09.

[6] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7] Arindam Banerjee,et al. Semi-supervised Clustering by Seeding , 2002, ICML.

[8] Bing Liu,et al. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[9] Joshua Zhexue Huang,et al. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[10] Joydeep Ghosh,et al. Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[11] Frann Cois Denis,et al. PAC Learning from Positive Statistical Queries , 1998, ALT.

[12] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Xiaoli Li,et al. Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.