MPRK algorithm for clustering the large text datasets

Text Document clustering is changing the massive collections of text documents into a lesser amount of suitable clusters. While numerous clustering approaches have been projected in the last few decades, the partitioned clustering algorithms are stated performing well on document clustering based on the reviewed papers. In this research, Modified Parallel Rough K-means (MPRK) algorithm is proposed for clustering the text document and it is evaluated on datasets and the results are compared to benchmark algorithms K-means and DPPSOK-means. The experimental analysis shows the proposed algorithm produces efficient result compared to the existing algorithms.

[1]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[2]  Félix de Moya Anegón,et al.  Comparison of neural models for document clustering , 2003, Int. J. Approx. Reason..

[3]  Yang Yan,et al.  Fuzzy semi-supervised co-clustering for text documents , 2013, Fuzzy Sets Syst..

[4]  刘璐,et al.  Improvement and Parallelism of k-Means Clustering Algorithm , 2005 .

[5]  Rafael Bello,et al.  On Clustering Validity Measures and the Rough Set Theory , 2006, 2006 Fifth Mexican International Conference on Artificial Intelligence.

[6]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[7]  Yoojin Chung,et al.  An Evolutionary Approach for Document Clustering , 2013 .

[8]  Qiang Shen,et al.  Rough set-aided keyword reduction for text categorization , 2001, Appl. Artif. Intell..

[9]  Rafael Bello,et al.  A method to build similarity relations into extended Rough Set Theory , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[10]  Yasuo Kudo,et al.  A sequential pattern mining algorithm using rough set theory , 2011, Int. J. Approx. Reason..

[11]  Giancarlo Mauri,et al.  A Comparative Study of Four Parallel and Distributed PSO Methods , 2011, New Generation Computing.

[12]  Rudra Kalyan Nayak,et al.  Rough Set based Attribute Clustering for Sample Classification of Gene Expression Data , 2012 .

[13]  Yong Shi,et al.  The Role of Text Pre-processing in Sentiment Analysis , 2013, ITQM.

[14]  Piotr Luszczek Parallel Programming in MATLAB , 2009, Int. J. High Perform. Comput. Appl..

[15]  James Nga-Kwok Liu,et al.  A rough set-based case-based reasoner for text categorization , 2006, Int. J. Approx. Reason..