论文信息 - Text Classification Using Clustering

Text Classification Using Clustering

This paper addresses the problem of learning to classify texts by exploiting information derived from both training and testing sets. To accomplish this, clustering is used as a complementary step to text classification, and is applied not only to the training set but also to the testing set. This approach allows us to estimate the location of the testing examples and the structure of the whole dataset, which is not possible for an inductive learner. The incorporation of the knowledge resulting from clustering to the simple BOW representation of the texts is expected to boost the performance of a classifier. Experiments conducted on tasks and datasets provided in the framework of the ECDL/PKDD 2006 Challenge Discovery on personalized spam filtering, demonstrate the effectiveness of the proposed approach. The experiments show substantial improvements on classification performance especially for small training sets.

T. Kalamboukis | Antonia Kyriakopoulou | A. Kyriakopoulou

[1] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[2] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[3] Yiming Yang,et al. Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[4] Yiming Yang,et al. An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[5] Andreas S. Weigend,et al. A neural network approach to topic spotting , 1995 .

[6] Hwee Tou Ng,et al. Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[7] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[8] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9] David D. Lewis,et al. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[10] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[11] Andrew McCallum,et al. Distributional clustering of words for text classification , 1998, SIGIR '98.