Performance Tuning of K-Mean Clustering Algorithm a Step towards Efficient DSS

This research is the first step in building an efficient Decision Support System (DSS) which employs Data Mining (DM) predictive, classification, clustering, and association rules techniques. This step considers finding groups of members in the dataset that are very different from each other, and whose members are very similar to each other, therefore one DM task is applied which is clustering task. The main objective of the proposed research is to enhance the performance of one of the most well-known popular clustering algorithms (K-mean) to produce near-optimal decisions for telcos churn prediction and retention problems. Due to its performance in clustering massive data sets. The final clustering result of the k-mean clustering algorithm greatly depends upon the correctness of the initial centroids, which are selected randomly. This research will be followed by a serious of researches targeting the main objective which is an efficient DSS which will be applied on customer banking data. In this research a new method is proposed for finding the better initial centroids to provide an efficient way of assigning the data points to suitable clusters with reduced time complexity. The proposed algorithm is successfully developed an applied on customer banking data, and the evaluation results are presented. Index Terms — Data Mining, Classification, K-Mean, Business Information, Data Envelopment Analysis, Artificial Neural Network, Rough set Theory

[1]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[2]  Simon Parsons,et al.  A review of uncertainty handling formalisms , 1998, Applications of Uncertainty Formalisms.

[3]  Enric Hernández,et al.  A General Framework for Induction of Decision Trees under Uncertainty , 2003, Modelling with Words.

[4]  Paul E. Green,et al.  K-modes Clustering , 2001, J. Classif..

[5]  Fang Yuan,et al.  A new algorithm to get the initial centroids , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[6]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[7]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[8]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[9]  Chen Zhang,et al.  K-means Clustering Algorithm with Improved Initial Center , 2009, 2009 Second International Workshop on Knowledge Discovery and Data Mining.

[10]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.