Uniform Deviation Bounds for Unbounded Loss Functions like k-Means

Uniform deviation bounds limit the difference between a model's expected loss and its loss on an empirical sample uniformly for all models in a learning problem. As such, they are a critical component to empirical risk minimization. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are *unbounded*. In our main application, this allows us to obtain bounds for $k$-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $\mathcal{O}\left(m^{-\frac12}\right)$ compared to the previously known $\mathcal{O}\left(m^{-\frac14}\right)$ rate. Furthermore, we show that the rate also depends on the kurtosis - the normalized fourth moment which measures the "tailedness" of a distribution. We further provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[3]  Clément Levrard Fast rates for empirical vector quantization , 2012, 1201.6052.

[4]  Yi Li,et al.  Improved bounds on the sample complexity of learning , 2000, SODA '00.

[5]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[6]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[7]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[8]  Tamás Linder,et al.  The minimax distortion redundancy in empirical quantizer design , 1997, Proceedings of IEEE International Symposium on Information Theory.

[9]  D. Pollard Convergence of stochastic processes , 1984 .

[10]  Alexander Rakhlin,et al.  Stability of $K$-Means Clustering , 2006, NIPS.

[11]  Sanjoy Dasgupta,et al.  Moment-based Uniform Deviation Bounds for k-means and Friends , 2013, NIPS.

[12]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[13]  László Györfi,et al.  Individual convergence rates in empirical vector quantizer design , 2005, IEEE Transactions on Information Theory.

[14]  Yaofeng Ren,et al.  On the best constant in Marcinkiewicz-Zygmund inequality , 2001 .

[15]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[16]  Ohad Shamir,et al.  Cluster Stability for Finite Samples , 2007, NIPS.

[17]  J. Moors,et al.  The Meaning of Kurtosis: Darlington Reexamined , 1986 .

[18]  Shai Ben-David,et al.  A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.