Clustering Oligarchies

We investigate the extent to which clustering algorithms are robust to the addition of a small, potentially adversarial, set of points. Our analysis reveals radical differences in the robustness of popular clustering methods. k-means and several related techniques are robust when data is clusterable, and we provide a quantitative analysis capturing the precise relationship between clusterability and robustness. In contrast, common linkage-based algorithms and several standard objective-function-based clustering methods can be highly sensitive to the addition of a small set of points even when the data is highly clusterable. We call such sets of points oligarchies. Lastly, we show that the behavior with respect to oligarchies of the popular Lloyd’s method changes radically with the initialization technique.

[1]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[2]  Reza Bosagh Zadeh,et al.  A Uniqueness Theorem for Clustering , 2009, UAI.

[3]  Alok Aggarwal,et al.  Finding k Points with Minimum Diameter and Related Problems , 1991, J. Algorithms.

[4]  Luis Angel García-Escudero,et al.  A review of robust clustering methods , 2010, Adv. Data Anal. Classif..

[5]  Shai Ben-David,et al.  Discerning Linkage-Based Algorithms among Hierarchical Clustering Methods , 2011, IJCAI.

[6]  Shai Ben-David,et al.  Towards Property-Based Classification of Clustering Paradigms , 2010, NIPS.

[7]  Maria-Florina Balcan,et al.  Robust hierarchical clustering , 2013, J. Mach. Learn. Res..

[8]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[9]  C. Hennig,et al.  Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods , 2008 .

[10]  Pankaj K. Agarwal,et al.  Exact and Approximation Algortihms for Clustering , 1997 .

[11]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[12]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[13]  Shai Ben-David,et al.  Characterization of Linkage-based Clustering , 2010, COLT.

[14]  Alfonso Gordaliza Ramos,et al.  A general trimming approach to robust cluster analysis , 2007 .

[15]  Mohammed J. Zaki,et al.  Clusterability Detection and Initial Seed Selection in Large Data Sets , 1999 .