Uniform Deviation Bounds for k-Means Clustering

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on a random sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for unbounded loss functions. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of O ( m− 1 2 ) compared to the previously known O ( m− 1 4 ) rate. We further show that this rate also depends on the kurtosis — the normalized fourth moment which measures the “tailedness” of the distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.

[1]  Yi Li,et al.  Improved bounds on the sample complexity of learning , 2000, SODA '00.

[2]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[3]  Andreas Krause,et al.  Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures , 2015, AISTATS.

[4]  Shai Ben-David,et al.  A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.

[5]  Clément Levrard Fast rates for empirical vector quantization , 2012, 1201.6052.

[6]  Ohad Shamir,et al.  Cluster Stability for Finite Samples , 2007, NIPS.

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  S. Mendelson Learning without concentration for general loss functions , 2014, 1410.3192.

[9]  D. Pollard Convergence of stochastic processes , 1984 .

[10]  Andreas Krause,et al.  Coresets for Nonparametric Estimation - the Case of DP-Means , 2015, ICML.

[11]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[12]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[13]  Andreas Krause,et al.  Training Mixture Models at Scale via Coresets , 2017 .

[14]  Andreas Krause,et al.  Fast and Provably Good Seedings for k-Means , 2016, NIPS.

[15]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[16]  Andreas Krause,et al.  Distributed and Provably Good Seedings for k-Means in Constant Rounds , 2017, ICML.

[17]  Peter Grünwald,et al.  Fast Rates with Unbounded Losses , 2016, ArXiv.

[18]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[19]  Yaofeng Ren,et al.  On the best constant in Marcinkiewicz-Zygmund inequality , 2001 .

[20]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[21]  Binh T. Nguyen,et al.  Fast learning rates with heavy-tailed losses , 2016, NIPS.

[22]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[23]  Alexander Rakhlin,et al.  Stability of $K$-Means Clustering , 2006, NIPS.

[24]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[25]  Tamás Linder,et al.  The minimax distortion redundancy in empirical quantizer design , 1997, Proceedings of IEEE International Symposium on Information Theory.

[26]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[27]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[28]  László Györfi,et al.  Individual convergence rates in empirical vector quantizer design , 2005, IEEE Transactions on Information Theory.

[29]  J. Moors,et al.  The Meaning of Kurtosis: Darlington Reexamined , 1986 .

[30]  Sanjoy Dasgupta,et al.  Moment-based Uniform Deviation Bounds for k-means and Friends , 2013, NIPS.