Clustering Formulation Using Constraint Optimization

The problem of clustering a set of data is a textbook machine learning problem, but at the same time, at heart, a typical optimization problem. Given an objective function, such as minimizing the intra-cluster distances or maximizing the inter-cluster distances, the task is to find an assignment of data points to clusters that achieves this objective. In this paper, we present a constraint programming model for a centroid based clustering and one for a density based clustering. In particular, as a key contribution, we show how the expressivity introduced by the formulation of the problem by constraint programming makes the standard problem easy to be extended with other constraints that permit to generate interesting variants of the problem. We show this important aspect in two different ways: first, we show how the formulation of the density-based clustering by constraint programming makes it very similar to the label propagation problem and then, we propose a variant of the standard label propagation approach.

[1]  Stefan Kramer,et al.  Integer Linear Programming Models for Constrained Clustering , 2010, Discovery Science.

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  S. S. Ravi,et al.  Clustering with Constraints: Feasibility Issues and the k-Means Algorithm , 2005, SDM.

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  P. Hansen,et al.  A survey on exact methods for minimum sum-of-squares clustering , 2008 .

[6]  Seiji Yamada,et al.  Clustering by Learning Constraints Priorities , 2012, 2012 IEEE 12th International Conference on Data Mining.

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[9]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[10]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[11]  Tias Guns,et al.  Constraint-Based Sequence Mining Using Constraint Programming , 2015, CPAIOR.

[12]  Tias Guns,et al.  Constrained Clustering Using Column Generation , 2014, CPAIOR.

[13]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[14]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[15]  Dino Pedreschi,et al.  A classification for community discovery methods in complex networks , 2011, Stat. Anal. Data Min..

[16]  Luc De Raedt,et al.  k-Pattern Set Mining under Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[17]  Daniel Aloise Exact algorithms for minimum sum-of-squares clustering , 2009 .

[18]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Thi-Bich-Hanh Dao,et al.  A Declarative Framework for Constrained Clustering , 2013, ECML/PKDD.

[20]  Frank Klawonn,et al.  Guide to Intelligent Data Analysis - How to Intelligently Make Sense of Real Data , 2010, Texts in Computer Science.

[21]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[22]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[23]  Pierre Hansen,et al.  An Interior Point Algorithm for Minimum Sum-of-Squares Clustering , 1997, SIAM J. Sci. Comput..

[24]  S. S. Ravi,et al.  Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results , 2009, Data Mining and Knowledge Discovery.

[25]  S. S. Ravi,et al.  The complexity of non-hierarchical clustering with instance and cluster level constraints , 2007, Data Mining and Knowledge Discovery.

[26]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[27]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[28]  S. S. Ravi,et al.  Identifying and Generating Easy Sets of Constraints for Clustering , 2006, AAAI.