Research on distributed text clustering based on frequent itemset
暂无分享,去创建一个
Text clustering, as a significant field in natural language processing, is a key technology of processing and organizing massive text data. In the era of big data, however, the massiveness of data brings great challenge in aspects of time and accuracy of text clustering. This paper focus on the issue of speed and preciseness in text clustering combined with genetic algorithm, feedback and distributed computing. A distributed text clustering method is proposed, and it is based on frequent Itemset. The examination result shows it can find out the global optimal centers more efficiently and make the clustering most accurate.
[1] Johan A. K. Suykens,et al. Sparse kernel spectral clustering models for large-scale data analysis , 2011, Neurocomputing.