Convergence of the k-Means Minimization Problem using Γ-Convergence

The $k$-means method is an iterative clustering algorithm which associates each observation with one of $k$ clusters. It traditionally employs cluster centers in the same space as the observed data. By relaxing this requirement, it is possible to apply the $k$-means method to infinite dimensional problems, for example, multiple target tracking and smoothing problems in the presence of unknown data association. Via a $\Gamma$-convergence argument, the associated optimization problem is shown to converge in the sense that both the $k$-means minimum and minimizers converge in the large data limit to quantities which depend upon the observed data only through its distribution. The theory is supplemented with two examples to demonstrate the range of problems now accessible by the $k$-means method. The first example combines a nonparametric smoothing problem with unknown data association. The second addresses tracking using sparse data from a network of passive sensors.

[1]  T. Laloë,et al.  L1-Quantization and clustering in Banach spaces , 2010 .

[2]  R. A. Gaskins,et al.  Nonparametric roughness penalties for probability densities , 2022 .

[3]  Jüri Lember,et al.  On minimizing sequences for k-centres , 2003, J. Approx. Theory.

[4]  F. O’Sullivan A Statistical Perspective on Ill-posed Inverse Problems , 1986 .

[5]  G. D. Maso,et al.  An Introduction to-convergence , 1993 .

[6]  A. Stuart,et al.  MAP estimators and their consistency in Bayesian nonparametric inverse problems , 2013, 1303.4795.

[7]  P. Chou The distortion of vector quantizers trained on n vectors decreases to the optimum as O/sub p/(1/n) , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[8]  András Antos Improved Minimax Bounds on the Test and Training Distortion of Empirically Designed Vector Quantizers , 2005, COLT.

[9]  Kellen Petersen August Real Analysis , 2009 .

[10]  D. Pollard A Central Limit Theorem for $k$-Means Clustering , 1982 .

[11]  Andrea Braides Γ-convergence for beginners , 2002 .

[12]  Linda H. Zhao Bayesian aspects of some nonparametric problems , 2000 .

[13]  L. Brown,et al.  Asymptotic equivalence of nonparametric regression and white noise , 1996 .

[14]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[15]  Tamás Linder,et al.  The minimax distortion redundancy in empirical quantizer design , 1997, Proceedings of IEEE International Symposium on Information Theory.

[16]  Lorenzo Rosasco,et al.  Learning Manifolds with K-Means and K-Flats , 2012, NIPS.

[17]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18]  R. Eubank Nonparametric Regression and Spline Smoothing , 1999 .

[19]  Saad T. Bakir,et al.  Nonparametric Regression and Spline Smoothing , 2000, Technometrics.

[20]  Juan Antonio Cuesta-Albertos,et al.  Impartial trimmed k-means for functional data , 2007, Comput. Stat. Data Anal..

[21]  Luc Devroye,et al.  On the Performance of Clustering in Hilbert Spaces , 2008, IEEE Transactions on Information Theory.

[22]  Stig Larsson,et al.  Posterior Contraction Rates for the Bayesian Approach to Linear Ill-Posed Inverse Problems , 2012, 1203.5753.

[23]  Tamás Linder,et al.  Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding , 1994, IEEE Trans. Inf. Theory.

[24]  Thaddeus Tarpey,et al.  Clustering Functional Data , 2003, J. Classif..

[25]  E. Feinberg,et al.  Fatou's Lemma for Weakly Converging Probabilities , 2012, 1206.4073.

[26]  J. Hartigan Asymptotic Distributions for Clustering Criteria , 1978 .

[27]  Karin Rothschild,et al.  A Course In Functional Analysis , 2016 .

[28]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[29]  P. Hall,et al.  Theory for penalised spline regression , 2005 .

[30]  J. A. Cuesta,et al.  The strong law of large numbers for k-means and best possible nets of Banach valued random variables , 1988 .

[31]  Shai Ben-David,et al.  Stability of k -Means Clustering , 2007, COLT.