Robust k-means: a Theoretical Revisit

Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on “well-structured” datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers. Finally, we show that, with slight modifications, the most general non-asymptotic results for consistency of quadratic k-means remain valid for this robust variant.

[1]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[2]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[3]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[6]  Daniela M Witten,et al.  Penalized unsupervised learning with outliers. , 2013, Statistics and its interface.

[7]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[8]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[11]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[12]  S. Geer Empirical Processes in M-Estimation , 2000 .

[13]  Y. She,et al.  Thresholding-based iterative selection procedures for model selection and shrinkage , 2008, 0812.5061.

[14]  T. Linder LEARNING-THEORETIC METHODS IN VECTOR QUANTIZATION , 2002 .

[15]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  Wei-Chen Chen,et al.  MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms , 2012 .

[18]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[19]  Shai Ben-David,et al.  Clustering in the Presence of Background Noise , 2014, ICML.

[20]  D. Pollard Convergence of stochastic processes , 1984 .

[21]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[22]  G. Ritter Robust Cluster Analysis and Variable Selection , 2014 .

[23]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[24]  Georgios B. Giannakis,et al.  Robust Clustering Using Outlier-Sparsity Regularization , 2011, IEEE Transactions on Signal Processing.