Tight lower bound instances for k-means++ in two dimensions

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the Lloyd's algorithm for the k-means problem. It was conjectured by Brunsch and Roglin 9 that k-means++ behaves well for datasets with small dimension. More specifically, they conjectured that the k-means++ seeding algorithm gives O ( log ? d ) approximation with high probability for any d-dimensional dataset. In this work, we refute this conjecture by giving two dimensional datasets on which the k-means++ seeding algorithm achieves an O ( log ? k ) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. 12 and by Brunsch and Roglin 9.