Frequent Itemsets Methods for Text Clustering

Text clustering is a crucial application of data mining. It can be used to structure hypertext documents or large sets of text. Many research works have dived into document clustering as a technique for improving search, information retrieval, document browsing, automatic topic identification, as well as the primitive task of clustering. Major challenges are entangling researchers, especially when working with large scale datasets, such as very high dimensionality and cluster labeling. To tackle these challenges, a number of techniques using frequent itemsets mining methods in text clustering have been proposed. In this paper, we review such techniques while highlighting their strengths and limitations. With the analysis of associated methodologies, we also propose a general framework for the task of text clustering using frequent itemsets mining algorithms.