A biased random-key genetic algorithm for data clustering.

Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  James C. Bean,et al.  Genetic Algorithms and Random Keys for Sequencing and Optimization , 1994, INFORMS J. Comput..

[3]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[4]  Mauricio G. C. Resende,et al.  Biased random-key genetic algorithms for combinatorial optimization , 2011, J. Heuristics.

[5]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[6]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[7]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Investigation of a new GRASP-based clustering algorithm applied to biological data , 2010, Comput. Oper. Res..

[8]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[9]  Renata M. Aiex,et al.  Parallel GRASP with path-relinking for job shop scheduling , 2003, Parallel Comput..

[10]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Rafael Martí,et al.  GRASP and Path Relinking for 2-Layer Straight Line Crossing Minimization , 1999, INFORMS J. Comput..

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Mauricio G. C. Resende,et al.  GRASP with Path-Relinking for Data Clustering: A Case Study for Biological Data , 2011, SEA.

[16]  Celso C. Ribeiro,et al.  A Hybrid GRASP with Perturbations for the Steiner Problem in Graphs , 2002, INFORMS J. Comput..

[17]  Akira Ushioda,et al.  Hierarchical Clustering of Words and Application to NLP Tasks , 1996, VLC@COLING.

[18]  Celso C. Ribeiro,et al.  Greedy Randomized Adaptive Search Procedures , 2003, Handbook of Metaheuristics.

[19]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR 1979.

[20]  Anil K. Jain,et al.  A self-organizing network for hyperellipsoidal clustering (HEC) , 1996, IEEE Trans. Neural Networks.

[21]  Mauricio G. C. Resende,et al.  GRASP heuristic with path-relinking for the multi-plant capacitated lot sizing problem , 2010, Eur. J. Oper. Res..

[22]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[23]  Celso C. Ribeiro,et al.  GRASP with Path-Relinking: Recent Advances and Applications , 2005 .

[24]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[25]  Mauricio G. C. Resende,et al.  An Annotated Bibliography of Grasp Part Ii: Applications , 2022 .

[26]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[27]  Richard C. Dubes,et al.  Experiments in projection and clustering by simulated annealing , 1989, Pattern Recognit..

[28]  Mauricio G. C. Resende,et al.  An Annotated Bibliography of Grasp Part I: Algorithms , 2022 .

[29]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[30]  Xin Yao,et al.  An evolutionary clustering algorithm for gene expression microarray data analysis , 2006, IEEE Transactions on Evolutionary Computation.

[31]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[32]  Omid Omidvar,et al.  Neural Networks and Pattern Recognition , 1997 .

[33]  T. Golub,et al.  Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. , 2004, Blood.

[34]  Panos M. Pardalos,et al.  GRASP with Path Relinking for Three-Index Assignment , 2005, INFORMS J. Comput..

[35]  Khaled S. Al-Sultan,et al.  A Tabu search approach to the clustering problem , 1995, Pattern Recognit..

[36]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[37]  David A. Bader,et al.  Graph Partitioning and Graph Clustering , 2013 .

[38]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[41]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  Donald R. Jones,et al.  Solving Partitioning Problems with Genetic Algorithms , 1991, International Conference on Genetic Algorithms.

[44]  M. Resende,et al.  A probabilistic heuristic for a computationally difficult set covering problem , 1989 .

[45]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[46]  Carlos S. Frenk,et al.  Galaxy formation through hierarchical clustering , 1991 .

[47]  Mauricio G. C. Resende,et al.  Grasp: An Annotated Bibliography , 2002 .

[48]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[49]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[50]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[51]  W. Spears,et al.  On the Virtues of Parameterized Uniform Crossover , 1995 .

[52]  Mariá C. V. Nascimento,et al.  Uma heurística GRASP para o problema de dimensionamento de lotes com múltiplas plantas , 2007 .