A flexible cluster-oriented alternative clustering algorithm for choosing from the Pareto front of solutions

Supervised alternative clustering is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with trade-offs. Most approaches in the literature optimize these objectives sequentially (one objective after another one) or indirectly (by some heuristic combination of the objectives). Solving a multi-objective optimization problem in these ways can result in solutions which are dominated, and not Pareto-optimal. We develop a direct algorithm, called COGNAC, which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. COGNAC performs the recombination operator at the cluster level instead of at the object level, as in the traditional genetic algorithms. It can accept arbitrary clustering quality and dissimilarity objectives and provides solutions dominating those obtained by other state-of-the-art algorithms. Based on COGNAC, we propose another algorithm called SGAC for the sequential generation of alternative clusterings where each newly found alternative clustering is guaranteed to be different from all previous ones. The experimental results on widely used benchmarks demonstrate the advantages of our approach.

[1]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[2]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[3]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[4]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  L. Hubert,et al.  Comparing partitions , 1985 .

[8]  Emanuel Falkenauer,et al.  A New Representation and Operators for Genetic Algorithms Applied to Grouping Problems , 1994, Evolutionary Computation.

[9]  Zbigniew Michalewicz,et al.  Evolutionary Computation 2 , 2000 .

[10]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[11]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[12]  Thomas Hofmann,et al.  Non-redundant clustering with conditional ensembles , 2005, KDD '05.

[13]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[15]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Kalyanmoy Deb,et al.  Multiobjective optimization , 1997 .

[17]  Ian Davidson,et al.  Finding Alternative Clusterings Using Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Thomas Hofmann,et al.  Conditional Information Bottleneck Clustering , 2008 .

[19]  Mauro Brunato,et al.  Reactive Search and Intelligent Optimization , 2008 .

[20]  Inderjit S. Dhillon,et al.  Simultaneous Unsupervised Learning of Disparate Clusterings , 2008, Stat. Anal. Data Min..

[21]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Ian Davidson,et al.  A principled and flexible framework for finding alternative clusterings , 2009, KDD.

[23]  Roberto Battiti,et al.  Brain-Computer Evolutionary Multiobjective Optimization: A Genetic Algorithm Adapting to the Decision Maker , 2010, IEEE Trans. Evol. Comput..

[24]  James Bailey,et al.  Generation of Alternative Clusterings Using the CAMI Approach , 2010, SDM.

[25]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[26]  Vincent Ng,et al.  Mining Clustering Dimensions , 2010, ICML.

[27]  Xuan Vinh Nguyen,et al.  minCEntropy: A Novel Information Theoretic Approach for the Generation of Alternative Clusterings , 2010, 2010 IEEE International Conference on Data Mining.

[28]  Tijl De Bie Subjectively Interesting Alternative Clusters , 2011, MultiClust@ECML/PKDD.

[29]  Roberto Battiti,et al.  A Cluster-Oriented Genetic Algorithm for Alternative Clustering , 2012, 2012 IEEE 12th International Conference on Data Mining.

[30]  Thomas Seidl,et al.  Multi-view clustering using mixture models in subspace projections , 2012, KDD.

[31]  Tijl De Bie,et al.  Subjectively interesting alternative clusterings , 2013, Machine Learning.