Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means Clustering

In this paper, we propose an implicit gradient descent algorithm for the classic k-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to k-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.

[1]  Carlo Baldassi,et al.  Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Massimo Fornasier,et al.  Linearly Constrained Nonsmooth and Nonconvex Minimization , 2012, SIAM J. Optim..

[4]  François Fleuret,et al.  Nested Mini-Batch K-Means , 2016, NIPS.

[5]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[6]  Cheng Tang,et al.  Convergence Rate of Stochastic k-means , 2016, AISTATS.

[7]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[8]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[9]  Alexander Kaplan,et al.  Proximal Point Methods and Nonconvex Optimization , 1998, J. Glob. Optim..

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[12]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[13]  Stefano Soatto,et al.  Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.

[14]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[17]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[18]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.