Research on distributed text clustering based on frequent itemset

Text clustering, as a significant field in natural language processing, is a key technology of processing and organizing massive text data. In the era of big data, however, the massiveness of data brings great challenge in aspects of time and accuracy of text clustering. This paper focus on the issue of speed and preciseness in text clustering combined with genetic algorithm, feedback and distributed computing. A distributed text clustering method is proposed, and it is based on frequent Itemset. The examination result shows it can find out the global optimal centers more efficiently and make the clustering most accurate.