Clustering Documents using the 3-Gram Graph Representation Model

In this paper we illustrate an innovative clustering method of documents using the 3-Gram graphs representation model and deducing the problem of document clustering to graph partitioning. For the latter we employ the kernel k-means algorithm. We evaluated the proposed method using the Test Collections of Reuters-21578, and compared the results using the Latent Dirichlet Allocation (LDA) Algorithm. The results are encouraging demonstrating that the 3-Gram graph method has much better Recall and F1 score but worse Precision. Further changes that will further improve the results are identified.