Exploring Implicit and Explicit Geometrical Structure of Data for Deep Embedded Clustering

Clustering is an essential data analysis technique and has been studied extensively over the last decades. Previous studies have shown that data representation and data structure information are two critical factors for improving clustering performance, and it forms two important lines of research. The first line of research attempts to learn representative features, especially utilizing the deep neural networks, for handling clustering problems. The second concerns exploiting the geometric structure information within data for clustering. Although both of them have achieved promising performance in lots of clustering tasks, few efforts have been dedicated to combine them in a unified deep clustering framework, which is the research gap we aim to bridge in this work. In this paper, we propose a novel approach, Manifold regularized Deep Embedded Clustering (MDEC), to deal with the aforementioned challenge. It simultaneously models data generating distribution, cluster assignment consistency, as well as geometric structure of data in a unified framework. The proposed method can be optimized by performing mini-batch stochastic gradient descent and back-propagation. We evaluate MDEC on three real-world datasets (USPS, REUTERS-10K, and MNIST), where experimental results demonstrate that our model outperforms baseline models and obtains the state-of-the-art performance.

[1]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Lei Wang,et al.  Deep convolutional representations and kernel extreme learning machines for image classification , 2018, Multimedia Tools and Applications.

[3]  Takafumi Kanamori,et al.  Spectral Embedded Deep Clustering , 2019, Entropy.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Yang Yang,et al.  Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Junying Hu,et al.  A Deep Neural Network Based on ELM for Semi-supervised Learning of Image Classification , 2017, Neural Processing Letters.

[7]  Deng Cai,et al.  Gaussian Mixture Model with Local Consistency , 2010, AAAI.

[8]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Nong Sang,et al.  Manifold regularized semi-supervised Gaussian mixture model. , 2015, Journal of the Optical Society of America. A, Optics, image science, and vision.

[11]  Guodong Guo,et al.  A survey on deep learning based face recognition , 2019, Comput. Vis. Image Underst..

[12]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[13]  Man Zhang,et al.  Adversarial Discriminative Heterogeneous Face Recognition , 2017, AAAI.

[14]  Yi Tay,et al.  Deep Learning based Recommender System: A Survey and New Perspectives , 2018 .

[15]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[16]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[17]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[18]  Haifeng Hu,et al.  Fine Tuning Dual Streams Deep Network with Multi-scale Pyramid Decision for Heterogeneous Face Recognition , 2018, Neural Processing Letters.

[19]  Ivor W. Tsang,et al.  Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering , 2011, IEEE Transactions on Neural Networks.

[20]  Xulun Ye,et al.  Multi-manifold clustering: A graph-constrained deep nonparametric method , 2019, Pattern Recognit..

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Guoqing Chao,et al.  Discriminative K-Means Laplacian Clustering , 2018, Neural Processing Letters.

[23]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[24]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[25]  Amar Mitiche,et al.  Deep Clustering: On the Link Between Discriminative Models and K-Means , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[28]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[29]  Jiayu Zhou,et al.  Learning A Task-Specific Deep Architecture For Clustering , 2015, SDM.

[30]  Philip S. Yu,et al.  DeepCF: A Unified Framework of Representation Learning and Matching Function Learning in Recommender System , 2019, AAAI.

[31]  Wei Chen,et al.  Manifold NMF with L21 norm for clustering , 2018, Neurocomputing.

[32]  Wei-Yun Yau,et al.  Deep Subspace Clustering with Sparsity Prior , 2016, IJCAI.

[33]  Hong Yan,et al.  Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval , 2018, Pattern Recognit..

[34]  Yunde Jia,et al.  Discriminative structure selection method of Gaussian Mixture Models with its application to handwritten digit recognition , 2011, Neurocomputing.

[35]  Gang Chen,et al.  Deep Learning with Nonparametric Clustering , 2015, ArXiv.

[36]  Lina Yao,et al.  DeepRec: An Open-source Toolkit for Deep Learning based Recommendation , 2019, IJCAI.

[37]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[38]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[39]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.