On the Complexity of Clustering with Relaxed Size Constraints

We study the computational complexity of the problem of computing an optimal clustering \(\{A_1,A_2,...,A_k\}\) of a set of points assuming that every cluster size \(|A_i|\) belongs to a given set M of positive integers. We present a polynomial time algorithm for solving the problem in dimension 1, i.e. when the points are simply rational values, for an arbitrary set M of size constraints, which extends to the \(\ell _1\)-norm an analogous procedure known for the \(\ell _2\)-norm. Moreover, we prove that in the Euclidean plane, i.e. assuming dimension 2 and \(\ell _2\)-norm, the problem is NP-hard even with size constraints set reduced to \(M=\{2,3\}\).

[1]  Günter Rote,et al.  Minimum-weight triangulation is NP-hard , 2006, JACM.

[2]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[3]  Ayhan Demiriz,et al.  Constrained K-Means Clustering , 2000 .

[4]  Goos Kant,et al.  2-Visibility Drawings of Planar Graphs , 1996, GD.

[5]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[6]  Alberto Bertoni,et al.  Exact algorithms for size constrained 2-clustering in the plane , 2016, Theor. Comput. Sci..

[7]  Shunzhi Zhu,et al.  Data clustering with size constraints , 2010, Knowl. Based Syst..

[8]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[9]  Alberto Bertoni,et al.  Size Constrained Distance Clustering: Separation Properties and Some Complexity Results , 2012, Fundam. Informaticae.

[10]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[11]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[12]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[15]  Rüdiger Stephan,et al.  Cardinality constrained combinatorial optimization: Complexity and polyhedra , 2010, Discret. Optim..

[16]  Meena Mahajan,et al.  The planar k-means problem is NP-hard , 2012, Theor. Comput. Sci..

[17]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[18]  David Lichtenstein,et al.  Planar Formulae and Their Uses , 1982, SIAM J. Comput..

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[21]  Giorgio Valentini,et al.  Identification of promoter regions in genomic sequences by 1-dimensional constraint clustering , 2011, WIRN.

[22]  Anthony K. H. Tung,et al.  Constraint-based clustering in large databases , 2001, ICDT.

[23]  Andrea Vattani,et al.  k-means Requires Exponentially Many Iterations Even in the Plane , 2008, SCG '09.

[24]  Donald E. Knuth,et al.  The Problem of Compatible Representatives , 1992, SIAM J. Discret. Math..