Cluster Analysis and Mathematical Programming

Given a set of entities, Cluster Analysis aims at finding subsets, called clusters, which are homogeneous and/or well separated. As many types of clustering and criteria for homogeneity or separation are of interest, this is a vast field. A survey is given from a mathematical programming viewpoint. Steps of a clustering study, types of clustering and criteria are discussed. Then algorithms for hierarchical, partitioning, sequential, and additive clustering are studied. Emphasis is on solution methods, i.e., dynamic programming, graph theoretical algorithms, branch-and-bound, cutting planes, column generation and heuristics. Résumé Étant donné un ensemble d’objets, la classification automatique a pour but de trouver des sous-ensembles, ou classes, homogènes et/ou bien séparées. Comme de nombreux types de classification et critères d’homogénéité et de séparation sont dignes d’intéret, ce domaine est varié. On en présente une revue, d’un point de vue de programmation mathématique. On discute les étapes d’une étude de classification, les types de classigication et les critères. On étudie ensuite les algorithmes de classification hiérarchique, de partitionnement, de classification séquentielle et additive. On insiste sur les méthodes de résolution, c’est-à-dire la programmation dynamique, les algorithmes de graphes, les procédures d’optimisation par séparation, la génération de colonnes et les heuristiques. Acknoledgment: Corresponding author. Research supported by ONR grant N00014-95-1-0917, FCAR grant 95-ER-1048 and NSERC grants GP0105574 and GP0036426. State-of-the-art survey to be presented at the XVIth Mathematical Programming Symposium, Lausanne August 25–29 1997, to appear in Mathematical Programming, B.

[1]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[2]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[3]  Frank Harary,et al.  Graph Theory , 2016 .

[4]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[5]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[6]  Robert E. Jensen,et al.  A Dynamic Programming Algorithm for Cluster Analysis , 1969, Oper. Res..

[7]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[8]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..

[9]  Richard Bellman,et al.  A note on cluster analysis and dynamic programming , 1973 .

[10]  L. Hubert Min and max hierarchical clustering using asymmetric similarity measures , 1973 .

[11]  L. Hubert Some applications of graph theory to clustering , 1974 .

[12]  M. Rao,et al.  An Algorithm for the M-Median Plant Location Problem , 1974 .

[13]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[14]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[15]  M. Bruynooghe,et al.  Classification ascendante hiérarchique des grands ensembles de données : un algorithme rapide fondé sur la construction des voisinages réductibles , 1978 .

[16]  Donald Erlenkotter,et al.  A Dual-Based Procedure for Uncapacitated Facility Location , 1978, Oper. Res..

[17]  P. Hansen,et al.  Complete-Link Cluster Analysis by Graph Coloring , 1978 .

[18]  M. Jambu Classification automatique pour l'analyse des données , 1978 .

[19]  J. F. Marcotorchino,et al.  Optimisation en analyse ordinale des données , 1979 .

[20]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[21]  Pierre Hansen,et al.  Bicriterion Cluster Analysis , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  J. Chandon,et al.  Construction de l'ultramétrique la plus proche d'une dissimilarité au sens des moindres carrés , 1980 .

[23]  B. Leclerc Description combinatoire des ultramétriques , 1981 .

[24]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[25]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[26]  A. D. Gordon,et al.  Classification : Methods for the Exploratory Analysis of Multivariate Data , 1981 .

[27]  W. Welch Algorithmic complexity: three NP- hard problems in computational statistics , 1982 .

[28]  G. Soete A least squares algorithm for fitting additive trees to proximity data , 1983 .

[29]  Robert E. Tarjan An Improved Algorithm for Hierarchical Clustering Using Strong Components , 1983, Inf. Process. Lett..

[30]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[31]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[32]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[33]  Pierre Hansen,et al.  Roof duality, complementation and persistency in quadratic 0–1 optimization , 1984, Math. Program..

[34]  Jacques Desrosiers,et al.  Routing with time windows by column generation , 1983, Networks.

[35]  G. Soete Additive-tree representations of incomplete dissimilarity data , 1984 .

[36]  G. Soete Ultrametric tree representations of incomplete dissimilarity data , 1984 .

[37]  Geert De Soete,et al.  A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix , 1984, Pattern Recognit. Lett..

[38]  Dominique Peeters,et al.  A comparison of two dual-based procedures for solving the p-median problem , 1985 .

[39]  G. Diehr Evaluation of a Branch and Bound Algorithm for Clustering , 1985 .

[40]  L. Stanfel A recursive Lagrangian method for clustering problems , 1986 .

[41]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[42]  M. Minoux,et al.  Extension de la programmation linéaire généralisée au cas des programmes mixtes , 1987 .

[43]  Edwin Diday,et al.  Orders and overlapping clusters by pyramids , 1987 .

[44]  B. Jaumard,et al.  Minimum sum of diameters clustering , 1987 .

[45]  E. Pinson,et al.  Lower bounds to the graph partitioning problem through generalized linear programming and network flows , 1987 .

[46]  B. Mirkin Additive clustering and qualitative factor analysis methods for similarity matrices , 1989 .

[47]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[48]  B. Jaumard,et al.  Maximum sum-of-splits clustering , 1989 .

[49]  Pierre Hansen,et al.  Constrained Nonlinear 0-1 Programming , 1989 .

[50]  A. Dress,et al.  Weak hierarchies associated with similarity measures--an additive clustering technique. , 1989, Bulletin of mathematical biology.

[51]  Endre Boros,et al.  On clustering problems with connected optima in euclidean spaces , 1989, Discret. Math..

[52]  Anne-Béatrice Dufour,et al.  Le modèle euclidien en analyse des données , 1990 .

[53]  Martin Grötschel,et al.  Facets of the clique partitioning polytope , 1990, Math. Program..

[54]  Pierre Hansen,et al.  The basic algorithm for pseudo-Boolean programming revisited , 1988, Discret. Appl. Math..

[55]  B. Jaumard,et al.  Efficient algorithms for divisive hierarchical clustering with the diameter criterion , 1990 .

[56]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[57]  Gary Klein,et al.  Optimal clustering: A model and method , 1991 .

[58]  Chuan Yi Tang,et al.  A unified approach for solving bottleneck k-bipartition problems , 1991, CSC '91.

[59]  M. R. Rao,et al.  On the multiway cut polyhedron , 1991, Networks.

[60]  Vishwani D. Agrawal,et al.  Quadratic 0-1 Programming , 1991 .

[61]  Gerhard J. Woeginger,et al.  Geometric Clusterings , 1991, J. Algorithms.

[62]  Alok Aggarwal,et al.  Finding k Points with Minimum Diameter and Related Problems , 1991, J. Algorithms.

[63]  John Hershberger,et al.  Minimizing the Sum of Diameters Efficiently , 1992, Comput. Geom..

[64]  Ian T. Jolliffe 10. Exploratory and Multivariate Data Analysis , 1993 .

[65]  F. Glover,et al.  In Modern Heuristic Techniques for Combinatorial Problems , 1993 .

[66]  Patrice Bertrand Structural Properties of Pyramidal Clustering , 1993, Partitioning Data Sets.

[67]  George L. Nemhauser,et al.  Min-cut clustering , 1993, Math. Program..

[68]  Pierre Hansen,et al.  How to Choose K Entities Among N , 1994, Partitioning Data Sets.

[69]  Michiel H. M. Smid,et al.  Static and Dynamic Algorithms for k-Point Clustering Problems , 1993, J. Algorithms.

[70]  Edwin Diday From Data to Knowledge: Probabilist Objects for a Symbolic Data Analysis , 1993, Partitioning Data Sets.

[71]  Alain Guénoche Enumération des partitions de diamètre minimum , 1993, Discret. Math..

[72]  Pierre Hansen,et al.  A Labeling Algorithm for Minimum Sum of Diameters Partitioning of Graphs , 1993, Partitioning Data Sets.

[73]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[74]  Erwin Pesch,et al.  Fast Clustering Algorithms , 1994, INFORMS J. Comput..

[75]  Pierre Hansen,et al.  Partitioning Problems in Cluster Analysis: A Review of Mathematical Programming Approaches , 1994 .

[76]  Yadolah Dodge,et al.  Complexity relaxation of dynamic programming for cluster analysis , 1994 .

[77]  L. Hubert,et al.  Iterative projection strategies for the least-squares fitting of tree structures to proximity data , 1995 .

[78]  Sunil Chopra,et al.  Extended formulations for the A-cut problem , 1996, Math. Program..

[79]  O. Gascuel,et al.  A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance , 1996 .

[80]  Laurence A. Wolsey,et al.  Formulations and valid inequalities for the node capacitated graph partitioning problem , 1996, Math. Program..

[81]  Uriel G. Rothblum,et al.  Localizing combinatorial properties of partitions , 1996, Discret. Math..

[82]  Rudolf Wille,et al.  Knowledge Spaces and Formal Concept Analysis , 1996 .

[83]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[84]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[85]  J.-P. Benzécri,et al.  Rappel : Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques , 1997 .

[86]  Gintaras Palubeckis,et al.  A Branch-and-Bound Approach Using Polyhedral Results for a Clustering Problem , 1997, INFORMS J. Comput..

[87]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[88]  Jacques Desrosiers,et al.  Stabilisation dans le cadre de la génération de colonnes , 1997 .

[89]  I. Stancu-Minasian Nonlinear Fractional Programming , 1997 .

[90]  Martin W. P. Savelsbergh,et al.  Branch-and-Price: Column Generation for Solving Huge Integer Programs , 1998, Oper. Res..

[91]  Pierre Hansen,et al.  Mixed-integer column generation algorithms and the probabilistic maximum satisfiability problem , 1991, Eur. J. Oper. Res..

[92]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[93]  Pierre Hansen,et al.  Exact Sequential Algorithms for Additive Clustering , 2000 .

[94]  Pierre Hansen,et al.  Polynomial algorithms for nested univariate clustering , 1996, Discret. Math..

[95]  Mirko Krivánek,et al.  NP-hard problems in hierarchical-tree clustering , 1986, Acta Informatica.

[96]  D. Steinley Journal of Classification , 2004, Vegetatio.

[97]  F. B A R A H O N A,et al.  EXPERIMENTS IN QUADRATIC 0-1 PROGRAMMING , 2005 .

[98]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.