High-performance link-based cluster ensemble approach for categorical data clustering

In recent years, the clustering ensembles emerged as a problem solver for extracting the data points into clusters in an efficient way. However, still clustering poses a serious issue due to the presence of imperfect information while partitioning the data into clusters. This creates a serious issue in creating an efficient cluster with cluster ensembles. In this paper, we propose a solution to solve the degradation in clustering during data partitioning. The initial clusters are generated using firefly algorithm. A linked cluster ensemble approach uses similarity measurement using multi-viewpoint and weighted triple quality using entropy measurements that ensembles the data points into clusters. This avoids the problem of local optimum and avoids the issues arise from high-dimensional datasets and improve the quality of clustering. Here, the data partitioning is done with bipartite spectral algorithm and similarity measurement. Finally, the artificial neural network is used to generate classified results from the optimized clustered datasets. The experimental results are carried out over UCI repository datasets and the results show that the proposed method attains an effective ensemble clustering with higher clustering accuracy than the conventional ones.

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[4]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[5]  Keke Chen,et al.  The "Best K" for Entropy-based Categorical Data Clustering , 2005, SSDBM.

[6]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[7]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[8]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[9]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[10]  D. Chandrakala,et al.  Measurement of similarity using link based cluster approach for categorical data , 2013, 2013 International Conference on Information Communication and Embedded Systems (ICICES).

[11]  Mohammed J. Zaki,et al.  CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[13]  R. Yusof,et al.  A New Method for Solving Supervised Data Classification Problems , 2014 .

[14]  Natthakan Iam-On,et al.  LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles , 2010 .

[15]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[16]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[17]  Andreas Rudolph,et al.  Techniques of Cluster Algorithms in Data Mining , 2002, Data Mining and Knowledge Discovery.

[18]  Qiang Yang,et al.  Discriminatively regularized least-squares classification , 2009, Pattern Recognit..

[19]  Jiye Liang,et al.  The k-modes type clustering plus between-cluster information for categorical data , 2014, Neurocomputing.

[20]  Renée J. Miller,et al.  LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.

[21]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[22]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[23]  Lihui Chen,et al.  Clustering with Multiviewpoint-Based Similarity Measure , 2012, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xin-She Yang,et al.  Firefly Algorithms for Multimodal Optimization , 2009, SAGA.