Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal or ordinal attributes. In such instances, the CPP incorporates edge costs obtained from an aggregation of binary equivalence relations among the attributes. We review existing theory and methods for the CPP and propose two versions of a new neighborhood search algorithm for efficient solution. The first version (NS-R) uses a relocation algorithm in the search for improved solutions, whereas the second (NS-TS) uses an embedded tabu search routine. The new algorithms are compared to simulated annealing (SA) and tabu search (TS) algorithms from the CPP literature. Although the heuristics yielded comparable results for some test problems, the neighborhood search algorithms generally yielded the best performances for large and difficult instances of the CPP.

[1]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[2]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[3]  J. Zahn Approximating Symmetric Relations by Equivalence Relations , 1964 .

[4]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[5]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[8]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Pierre Michaud,et al.  Modèles d'optimisation en analyse des données relationnelles , 1979 .

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[13]  Bernard Monjardet,et al.  The median procedure in cluster analysis and social choice theory , 1981, Math. Soc. Sci..

[14]  S. Régnier,et al.  Sur quelques aspects mathématiques des problèmes de classification automatique , 1983 .

[15]  U. Tüshaus Aggregation binärer Relationen in der qualitativen Datenanalyse , 1983 .

[16]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[17]  O. Opitz,et al.  Analyse qualitativer Daten: Einführung und Übersicht , 1984 .

[18]  O. Opitz,et al.  Analyse qualitativer Daten: Einführung und Übersicht , 1984 .

[19]  M. Schader,et al.  Ein Subgradientenverfahren zur Klassifikation qualitativer Daten , 1985 .

[20]  T. Klastorin The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach , 1985 .

[21]  J. M. Proth,et al.  Data analysis in real life environment : ins and outs of solving problems , 1985 .

[22]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[23]  Willem J. Heiser,et al.  Hierarchical trees can be perfectly scaled in one dimension , 1988 .

[24]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[25]  Martin Grötschel,et al.  Facets of the clique partitioning polytope , 1990, Math. Program..

[26]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[27]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[28]  C. Ribeiro,et al.  Clustering and clique partitioning: Simulated annealing and tabu search approaches , 1992 .

[29]  Jean-Pierre Barthélemy,et al.  The Median Procedure for Partitions , 1993, Partitioning Data Sets.

[30]  S. Shapiro,et al.  Mathematics without Numbers , 1993 .

[31]  Erwin Pesch,et al.  Fast Clustering Algorithms , 1994, INFORMS J. Comput..

[32]  Michael J. Brusco,et al.  Improving Personnel Scheduling at Airline Stations , 1995, Oper. Res..

[33]  L. W. Jacobs,et al.  Note: A local-search heuristic for large set-covering problems , 1995 .

[34]  Phipps Arabie,et al.  AN OVERVIEW OF COMBINATORIAL DATA ANALYSIS , 1996 .

[35]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[36]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[37]  Gintaras Palubeckis,et al.  A Branch-and-Bound Approach Using Polyhedral Results for a Clustering Problem , 1997, INFORMS J. Comput..

[38]  P. Hansen,et al.  Variable neighborhood search for the p-median , 1997 .

[39]  Yoshiko Wakabayashi The Complexity of Computing Medians of Relations , 1998 .

[40]  Michael A. Trick,et al.  Cliques and clustering: A combinatorial approach , 1998, Oper. Res. Lett..

[41]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[42]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[43]  Pierre Hansen,et al.  J-MEANS: a new local search heuristic for minimum sum of squares clustering , 1999, Pattern Recognit..

[44]  Frits C. R. Spieksma,et al.  The clique partitioning problem: Facets and patching facets , 2001, Networks.

[45]  Joaquín A. Pacheco,et al.  Design of hybrids for the minimum sum-of-squares clustering problem , 2003, Comput. Stat. Data Anal..

[46]  Fred W. Glover,et al.  Clustering of Microarray data via Clique Partitioning , 2005, J. Comb. Optim..

[47]  Jirí Grim,et al.  EM Cluster Analysis for Categorical Data , 2006, SSPR/SPR.

[48]  Irène Charon,et al.  Noising methods for a clique partitioning problem , 2006, Discret. Appl. Math..

[49]  Rafael Martí,et al.  Variable neighborhood search for the linear ordering problem , 2006, Comput. Oper. Res..

[50]  M. Brusco,et al.  A variable neighborhood search method for generalized blockmodeling of two-mode binary matrices , 2007 .

[51]  M. Brusco,et al.  A Comparison of Heuristic Procedures for Minimum Within-Cluster Sums of Squares Partitioning , 2007 .

[52]  M. Brusco,et al.  Optimal Partitioning of a Data Set Based on the p-Median Model , 2008 .

[53]  Hans-Friedrich Köhn,et al.  Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[54]  Bahram Alidaee,et al.  Clique Partitioning for Clustering: A Comparison with K-Means and Latent Class Analysis , 2007, Commun. Stat. Simul. Comput..

[55]  Nicolas de Condorcet Essai Sur L'Application de L'Analyse a la Probabilite Des Decisions Rendues a la Pluralite Des Voix , 2009 .

[56]  Edwin R. Hancock,et al.  Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop, SSPR&SPR 2010, Cesme, Izmir, Turkey, August 18-20, 2010. Proceedings , 2010, SSPR/SPR.