论文信息 - Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language

Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language

In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more. Keywords—Hierarchical K-mean like clustering (HKM), Kmeans, cluster centroids, initial partition, and document distances

Sameh Ghwanmeh

[1] Verayuth Lertnattee,et al. Multi-Dimensional Text Classification , 2002, COLING.

[2] Kevyn Collins-Thompson. A Clustering-Based Algorithm for Automatic Document Separation , 2002 .

[3] Gerhard Rigoll,et al. A Novel Feature Combination Approach for Spoken Document Classification with Support Vector Machines , 2003 .

[4] Marc El-Bèze,et al. Clustering by means of unsupervised decision trees or hierarchical and K-means-like algorithm , 2000 .

[5] Naftali Tishby,et al. The Power of Word Clusters for Text Classification , 2006 .

[6] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[7] Raghu Ramakrishnan,et al. Database Management Systems , 1976 .

[8] Rayid Ghani,et al. Using Error-Correcting Codes for Text Classification , 2000, ICML.

[9] F. Schwartz,et al. Using Clustering to Boost Text Classification , 2001 .

[10] Spiridon D. Likothanassis,et al. Integrating feature and instance selection for text classification , 2002, KDD.

[11] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.