Deep Clustering: On the Link Between Discriminative Models and K-Means

In the context of recent deep clustering studies, discriminative models dominate the literature and report the most competitive performances. These models learn a deep discriminative neural network classifier in which the labels are latent. Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning. It is generally acknowledged that discriminative objective functions (e.g., those based on the mutual information or the KL divergence) are more flexible than generative approaches (e.g., K-means) in the sense that they make fewer assumptions about the data distributions and, typically, yield much better unsupervised deep learning results. On the surface, several recent discriminative models may seem unrelated to K-means. This study shows that these models are, in fact, equivalent to K-means under mild conditions and common posterior models and parameter regularization. We prove that, for the commonly used logistic regression posteriors, maximizing the L2-regularized mutual information via an approximate alternating direction method (ADM) is equivalent to a soft and regularized K-means loss. Our theoretical analysis not only connects directly several recent state-of-the-art discriminative models to K-means, but also leads to a new soft and regularized deep K-means algorithm, which yields competitive performance on several image clustering benchmarks.

[1]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[2]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhihua Zhang,et al.  Surrogate maximization/minimization algorithms and extensions , 2007, Machine Learning.

[4]  George Trigeorgis,et al.  A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[5]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[6]  Huachun Tan,et al.  Variational Deep Embedding: A Generative Approach to Clustering , 2016, ArXiv.

[7]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[8]  Eric Granger,et al.  Scalable Laplacian K-modes , 2018, NeurIPS.

[9]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[10]  Ismail Ben Ayed,et al.  Volumetric Bias in Segmentation and Reconstruction: Secrets and Solutions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[13]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[14]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Ismail Ben Ayed,et al.  Kernel Clustering: Density Biases and Solutions , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ismail Ben Ayed,et al.  Kernel Cuts: Kernel and Spectral Clustering Meet Regularization , 2018, International Journal of Computer Vision.

[17]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[18]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[21]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[22]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Xiangchu Feng,et al.  Bregman-Proximal Augmented Lagrangian Approach to Multiphase Image Segmentation , 2017, SSVM.

[25]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  D. Bertsekas ON PENALTY AND MULTIPLIER METHODS FOR CONSTRAINED MINIMIZATION , 1976 .

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Harri Valpola,et al.  From neural PCA to deep unsupervised learning , 2014, ArXiv.

[30]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[31]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[32]  Xiang Chen,et al.  ADMM for Efficient Deep Learning with Global Convergence , 2019, KDD.

[33]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[34]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[35]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[36]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[37]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[38]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[39]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.