A hybrid heuristic for the k-medoids clustering problem

Clustering is an important tool for data analysis, since it allows the exploration of datasets with no or very little prior information. Its main goal is to group a set of data based on their similarity (dissimilarity). A well known mathematical formulation for clustering is the k-medoids problem. Current versions of k-medoids rely on heuristics, with good results reported in the literature. However, few methods that analyze the quality of the partitions found by the heuristics have been proposed. In this paper, we propose a hybrid Lagrangian heuristic for the k-medoids. We compare the performance of the proposed Lagrangian heuristic with other heuristics for the k-medoids problem found in literature. Experimental results presented that the proposed Lagrangian heuristic outperformed the other algorithms.

[1]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[2]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[3]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[4]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[5]  Akira Ushioda,et al.  Hierarchical Clustering of Words and Application to NLP Tasks , 1996, VLC@COLING.

[6]  G. Nemhauser,et al.  Exceptional Paper—Location of Bank Accounts to Optimize Float: An Analytic Study of Exact and Approximate Algorithms , 1977 .

[7]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[8]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[9]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[10]  H. Crowder,et al.  Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Investigation of a new GRASP-based clustering algorithm applied to biological data , 2010, Comput. Oper. Res..

[13]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[14]  Akira Ushioda,et al.  Hierarchical Clustering of Words , 1996, COLING.

[15]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[16]  F. Sibel Salman,et al.  A mixed-integer programming approach to the clustering problem with an application in customer segmentation , 2006, Eur. J. Oper. Res..

[17]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[18]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  F. E. Maranzana,et al.  On the Location of Supply Points to Minimize Transport Costs , 1964 .

[20]  Jill P. Mesirov,et al.  Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets , 2007, PloS one.

[21]  S Miyano,et al.  Open source clustering software. , 2004, Bioinformatics.

[22]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[23]  William W. Trigeiro,et al.  Capacitated lot sizing with setup times , 1989 .

[24]  S. Hakimi Optimum Distribution of Switching Centers in a Communication Network and Some Related Graph Theoretic Problems , 1965 .

[25]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[26]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[27]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[28]  Weiguo Sheng,et al.  A genetic k-medoids clustering algorithm , 2006, J. Heuristics.

[29]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.