Exact algorithms for size constrained 2-clustering in the plane

We study the problem of determining an optimal bipartition { A , B } of a set X of n points in R 2 , under the size constraints | A | = k and | B | = n - k , that minimizes the dispersion of points around their centroid in A and B, both in the cases of Euclidean and Manhattan norms. Under the Euclidean norm, we show that the problem can be solved in O ( n k 3 log 2 ź n ) time by using known properties on k-sets and convex hulls; moreover, the solutions for all k = 1 , 2 , ź , ź n / 2 ź can be computed in O ( n 2 log ź n ) time. In the case of Manhattan norm, we present an algorithm working in O ( n 2 log ź n ) time, which uses an extended version of red-black trees to maintain a bipartition of a planar point set. Also in this case we provide a full version of the algorithm yielding the solutions for all size constraints k. All these procedures work in O ( n ) space and rely on separation results of the clusters of optimal solutions.

[1]  Beth Novick Norm statistics and the complexity of clustering problems , 2009, Discret. Appl. Math..

[2]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[3]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[4]  P. Erdös,et al.  Dissection Graphs of Planar Point Sets , 1973 .

[5]  B. Jaumard,et al.  Minimum Sum of Squares Clustering in a Low Dimensional Space , 1996 .

[6]  S. Dasgupta The hardness of k-means clustering , 2008 .

[7]  Jan van Leeuwen,et al.  Maintenance of Configurations in the Plane , 1981, J. Comput. Syst. Sci..

[8]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[9]  Alberto Bertoni,et al.  Size Constrained Distance Clustering: Separation Properties and Some Complexity Results , 2012, Fundam. Informaticae.

[10]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[11]  Jianyi Lin,et al.  Exact Algorithms for Size Constrained Clustering , 2013 .

[12]  Alberto Bertoni,et al.  Size-constrained 2-clustering in the plane with Manhattan distance , 2014, ICTCS.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[15]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[16]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[17]  G.S. Brodal,et al.  Dynamic planar convex hull , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[18]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Tetsuo Asano,et al.  Clustering algorithms based on minimum and maximum spanning trees , 1988, SCG '88.

[21]  Andrea Vattani k-means Requires Exponentially Many Iterations Even in the Plane , 2011, Discret. Comput. Geom..

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Shunzhi Zhu,et al.  Data clustering with size constraints , 2010, Knowl. Based Syst..

[24]  Alberto Bertoni,et al.  Exact Algorithms for 2-Clustering with Size Constraints in the Euclidean Plane , 2015, SOFSEM.

[25]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[26]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[27]  Sergios Theodoridis,et al.  Pattern Recognition , 1998, IEEE Trans. Neural Networks.

[28]  Tamal K. Dey,et al.  Improved Bounds for Planar k -Sets and Related Problems , 1998, Discret. Comput. Geom..

[29]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.