A multi-act sequential game-based multi-objective clustering approach for categorical data

Abstract Clustering categorical data, where no natural ordering can be found among the attributes values, has started drawing interest recently. Few clustering methods have been proposed to satisfy the categorical data requirements. Most of these methods have focused on optimizing a single measure, however, several applications in different areas need to consider multiple incommensurable criteria, often conflicting, during clustering. Motivated by this, we developed a multi-objective clustering approach for categorical data based on sequential games. It can automatically generate the correct number of clusters. The approach consists of three main phases. The first phase identifies initial clusters according to an initialization mechanism which has an important effect in the final clustering result. The second phase uses multi-act multi-objective sequential two-player games in order to determine the appropriate number of clusters. A methodology based on backward induction is used to calculate a pure Nash equilibrium for each game. Finally, the third phase constructs homogenous clusters by optimizing intra-cluster inertia. The performance of this algorithm has been studied on both simulated and real-world datasets. Comparisons with other clustering algorithms illustrate the effectiveness of the proposed approach.

[1]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[2]  Timo Honkela,et al.  Very Large Two-Level SOM for the Browsing of Newsgroups , 1996, ICANN.

[3]  Ali Hamzeh,et al.  An enriched game-theoretic framework for multi-objective clustering , 2013, Appl. Soft Comput..

[4]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[5]  P. Song,et al.  Clustering Categorical Data Based on Distance Vectors , 2006 .

[6]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[7]  He Zengyou,et al.  Squeezer: an efficient algorithm for clustering categorical data , 2002 .

[8]  N. Ranganathan,et al.  A Game Theoretic Approach for Simultaneous Compaction and Equipartitioning of Spatial Data Sets , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[11]  Marcello Pelillo,et al.  A Game-Theoretic Approach to Hypergraph Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[13]  Eugenio Cesario,et al.  Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[15]  Ujjwal Maulik,et al.  Incremental learning based multiobjective fuzzy clustering for categorical data , 2014, Inf. Sci..

[16]  M. Tahar Kechadi,et al.  Clustering Based on Sequential Multi-Objective Games , 2014, DaWaK.

[17]  J. Sil,et al.  Clustering data set with categorical feature using multi objective genetic algorithm , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[18]  Subhash Sharma Applied multivariate techniques , 1995 .

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[20]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, CVPR 2004.

[21]  Y. Narahari,et al.  Novel Biobjective Clustering (BiGC) Based on Cooperative Game Theory , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Michael K. Ng,et al.  On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[25]  Philip S. Yu,et al.  Finding Localized Associations in Market Basket Data , 2002, IEEE Trans. Knowl. Data Eng..

[26]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[27]  A. Ferligoj,et al.  Direct multicriteria clustering algorithms , 1992 .

[28]  Omar S. Soliman,et al.  A Multi-objectives K-Modes Data Clustering Algorithm based on Self- Adaptive Differential Evolution , 2015 .

[29]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[30]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[31]  Joshua Zhexue Huang,et al.  A New Markov Model for Clustering Categorical Sequences , 2011, 2011 IEEE 11th International Conference on Data Mining.

[32]  Joshua D. Knowles,et al.  Multiobjective clustering around medoids , 2005, 2005 IEEE Congress on Evolutionary Computation.

[33]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[34]  Jianyong Wang,et al.  On efficiently summarizing categorical databases , 2005, Knowledge and Information Systems.