AutoEmbedder: A semi-supervised DNN embedding system for clustering

Abstract Clustering is widely used in unsupervised learning method that deals with unlabeled data. Deep clustering has become a popular study area that relates clustering with Deep Neural Network (DNN) architecture. Deep clustering method downsamples high dimensional data, which may also relate clustering loss. Deep clustering is also introduced in semi-supervised learning (SSL). Most SSL methods depend on pairwise constraint information, which is a matrix containing knowledge if data pairs can be in the same cluster or not. This paper introduces a novel embedding system named AutoEmbedder, that downsamples higher dimensional data to clusterable embedding points. To the best of our knowledge, this is the first research endeavor that relates to traditional classifier DNN architecture with a pairwise loss reduction technique. The training process is semi-supervised and uses Siamese network architecture to compute pairwise constraint loss in the feature learning phase. The AutoEmbedder outperforms most of the existing DNN based semi-supervised methods tested on famous datasets.

[1]  Sreeram Kannan,et al.  ClusterGAN : Latent Space Clustering in Generative Adversarial Networks , 2018, AAAI.

[2]  Jiayu Zhou,et al.  Learning A Task-Specific Deep Architecture For Clustering , 2015, SDM.

[3]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[4]  Chao Yang,et al.  A Survey on Deep Transfer Learning , 2018, ICANN.

[5]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  En Zhu,et al.  Deep Embedded Clustering with Data Augmentation , 2018, ACML.

[7]  Cha Zhang,et al.  Ensemble Machine Learning , 2012 .

[8]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[9]  Gang Chen,et al.  Deep Transductive Semi-supervised Maximum Margin Clustering , 2015, ArXiv.

[10]  Min Zhang,et al.  Semi-Supervised Learning Via Feedforward-Designed Convolutional Neural Networks , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[11]  Daniel Cremers,et al.  Associative Deep Clustering: Training a Classification Network with No Labels , 2018, GCPR.

[12]  Oliver Nina,et al.  A Decoder-Free Approach for Unsupervised Clustering and Manifold Learning with Random Triplet Mining , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[13]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[14]  Wei Liu,et al.  Deep Spectral Clustering Using Dual Autoencoder Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[16]  Wei Wang,et al.  Deep Embedding Network for Clustering , 2014, 2014 22nd International Conference on Pattern Recognition.

[17]  Lingfeng Wang,et al.  Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[19]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[20]  Zenglin Xu,et al.  Semi-supervised DenPeak Clustering with Pairwise Constraints , 2018, PRICAI.

[21]  Xuan Li,et al.  Local and global structure preserving based feature selection , 2012, Neurocomputing.

[22]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[23]  Eric Bair,et al.  Semi‐supervised clustering methods , 2013, Wiley interdisciplinary reviews. Computational statistics.

[24]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Zenglin Xu,et al.  Semi-supervised deep embedded clustering , 2019, Neurocomputing.

[27]  Tong Zhang,et al.  Deep Subspace Clustering Networks , 2017, NIPS.

[28]  Ioannis Athanasiadis,et al.  A Framework of Transfer Learning in Object Detection for Embedded Systems , 2018, ArXiv.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Jiancheng Lv,et al.  Unsupervised Multi-Manifold Clustering by Learning Deep Representation , 2017, AAAI Workshops.

[31]  Bo Zhang,et al.  Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders , 2017, Pattern Recognit..

[32]  Qiang Liu,et al.  A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[33]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[35]  Olivier Gibaru,et al.  CNN features are also great at unsupervised classification , 2017, ArXiv.

[36]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[37]  Frank Hoeppner,et al.  Fuzzy shell clustering algorithms in image processing: fuzzy C-rectangular and 2-rectangular shells , 1997, IEEE Trans. Fuzzy Syst..

[38]  Zenglin Xu,et al.  Adaptive local structure learning for document co-clustering , 2018, Knowl. Based Syst..

[39]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[40]  Sungzoon Cho,et al.  Bag-of-concepts: Comprehending document representation through clustering words in distributed representation , 2017, Neurocomputing.

[41]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[42]  Nuggehally Sampath Jayant,et al.  An adaptive clustering algorithm for image segmentation , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[43]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[44]  Vladlen Koltun,et al.  Robust continuous clustering , 2017, Proceedings of the National Academy of Sciences.

[45]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[46]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[47]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[48]  Vladlen Koltun,et al.  Deep Continuous Clustering , 2018, ArXiv.

[49]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[50]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[52]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.