论文信息 - Clustering Large Datasets with Apriori-based Algorithm and Concurrent Processing

Clustering Large Datasets with Apriori-based Algorithm and Concurrent Processing

 Abstract—This paper presents the integrated data mining processing technique to find appropriate initial centroids in data clustering process by k-means algorithm. The processes include data cleansing, preprocessing, and finding features relation with Apriori algorithm to get appropriate features. Our clustering process compares different initial selection schemes: static selection and random selection. The calculation of SSE (Sum of Square Error) uses parallel calculation for better computational performance. We propose the Pre-KMA model that represents the processes for finding appropriate initial clustering centroids and selecting the most relevant features from large datasets. The clustering evaluation results of SSE, loop of clustering, and time of processing confirm that with the Pre-KMA model we can get better clustering result with k-means clustering methodology. The experimental result shows that calculated SSE and processing time are decreased.

Nittaya Kerdprasop | Noppol Thangsupachai

[1] Tsau Young Lin,et al. Foundations and Advances in Data Mining , 2005 .

[2] Hakikur Rahman,et al. Data Mining Applications for Empowering Knowledge Societies , 2008 .

[3] Xindong Wu,et al. The Top Ten Algorithms in Data Mining , 2009 .

[4] Sam Lightstone,et al. Data Mining - Know It All , 2008 .

[5] Ronald K. Klimberg,et al. Data Mining Methods and Applications , 2007 .

[6] Tansel Özyer,et al. Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer , 2009, Applied Intelligence.

[7] David Taniar. Data Mining and Knowledge Discovery Technologies , 2008 .

[8] Huan Liu,et al. Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9] Bertrand Clarke,et al. Principles and Theory for Data Mining and Machine Learning , 2009 .

[10] Wei Jiang,et al. Data Mining Methods and Applications , 2006 .

[11] John Wang,et al. Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications , 2008 .

[12] Wee Keong Ng,et al. Proportionate feature selection - A pre-processing step for clustering , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[13] Oscar Castillo,et al. Proceedings of the International MultiConference of Engineers and Computer Scientists 2007, IMECS 2007, March 21-23, 2007, Hong Kong, China , 2007, IMECS.

[14] Anju Vyas. Print , 2003 .