Data Clustering Using Grouping Hyper-heuristics

Grouping problems represent a class of computationally hard to solve problems requiring optimal partitioning of a given set of items with respect to multiple criteria varying dependent on the domain. A recent work proposed a general-purpose selection hyper-heuristic search framework with reusable components, designed for rapid development of grouping hyper-heuristics to solve grouping problems. The framework was tested only on the graph colouring problem domain. Extending the previous work, this study compares the performance of selection hyper-heuristics implemented using the framework, pairing up various heuristic/operator selection and move acceptance methods for data clustering. The selection hyper-heuristic performs the search processing a single solution at any decision point and controls a fixed set of generic low level heuristics specifically designed for the grouping problems based on a bi-objective formulation. An archive of high quality solutions, capturing the trade-off between the number of clusters and overall error of clustering, is maintained during the search process. The empirical results verify the effectiveness of a successful selection hyper-heuristic, winner of a recent hyper-heuristic challenge for data clustering on a set of benchmark problem instances.

[1]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[2]  Robert T. Sumichrast,et al.  Impact of the replacement heuristic in a grouping genetic algorithm , 2003, Comput. Oper. Res..

[3]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[6]  Edmund K. Burke,et al.  The late acceptance Hill-Climbing heuristic , 2017, Eur. J. Oper. Res..

[7]  Xianda Zhang,et al.  A genetic algorithm with gene rearrangement for K-means clustering , 2009, Pattern Recognit..

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[10]  Michel Gendreau,et al.  Hyper-heuristics: a survey of the state of the art , 2013, J. Oper. Res. Soc..

[11]  El-Ghazali Talbi,et al.  A parallel genetic algorithm for the graph partitioning problem , 1991, ICS '91.

[12]  Edmund K. Burke,et al.  A Reinforcement Learning - Great-Deluge Hyper-Heuristic for Examination Timetabling , 2010, Int. J. Appl. Metaheuristic Comput..

[13]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[14]  Nicholas J. Radcliffe,et al.  Forma Analysis and Random Respectful Recombination , 1991, ICGA.

[15]  G. Dueck New optimization heuristics , 1993 .

[16]  Patrick D. Surry,et al.  Fitness Variance of Formae and Performance Prediction , 1994, FOGA.

[17]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[20]  Ender Özcan,et al.  A grouping hyper-heuristic framework: Application on graph colouring , 2015, Expert Syst. Appl..

[21]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[22]  Patrick De Causmaecker,et al.  A new hyper-heuristic as a general problem solver: an implementation in HyFlex , 2013, J. Sched..

[23]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[24]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[25]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[26]  Edmund K. Burke,et al.  HyFlex: A Flexible Framework for the Design and Analysis of Hyper-heuristics , 2009 .

[27]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[28]  Emanuel Falkenauer,et al.  The Grouping Genetic Algorithm , 1996 .

[29]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[30]  Ender Özcan,et al.  Linear Linkage Encoding in Grouping Problems: Applications on Graph Coloring and Timetabling , 2006, PATAT.

[31]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[32]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.