Improving CLOPE's Profit Value and Stability with an Optimized Agglomerative Approach

CLOPE (Clustering with sLOPE) is a simple and fast histogram-based clustering algorithm for categorical data. However, given the same data set with the same input parameter, the clustering results by this algorithm would possibly be different if the transactions are input in a different sequence. In this paper, a hierarchical clustering framework is proposed as an extension of CLOPE to generate stable and satisfactory clustering results based on an optimized agglomerative merge process. The new clustering profit is defined as the merge criteria and the cluster graph structure is proposed to optimize the merge iteration process. The experiments conducted on two datasets both demonstrate that the agglomerative approach achieves stable clustering results with a better profit value, but costs much more time due to the worse complexity.

[1]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  Zengyou He,et al.  A cluster ensemble method for clustering categorical data , 2005, Information Fusion.

[5]  He Zengyou,et al.  Squeezer: an efficient algorithm for clustering categorical data , 2002 .

[6]  徐晓飞,et al.  Squeezer:An Efficient Algorithm for Clustering Categorical Data , 2002 .

[7]  Kok-Leong Ong,et al.  sigma-SCLOPE: Clustering Categorical Streams Using Attribute Selection , 2005, KES.

[8]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Zengyou He,et al.  Squeezer: An efficient algorithm for clustering categorical data , 2008, Journal of Computer Science and Technology.

[10]  Jinyuan You,et al.  CLOPE: a fast and effective clustering algorithm for transactional data , 2002, KDD.

[11]  Xinbo Gao,et al.  A fuzzy CLOPE algorithm and its optimal parameter choice , 2006 .

[12]  Ke Wang,et al.  Clustering transactions using large items , 1999, CIKM '99.

[13]  Georgios C. Anagnostopoulos,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2003, Lecture Notes in Computer Science.

[14]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[15]  Ee-Peng Lim,et al.  SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes , 2004, DaWaK.

[16]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.