Semi-supervised deep embedded clustering

Abstract Clustering is an important topic in machine learning and data mining. Recently, deep clustering, which learns feature representations for clustering tasks using deep neural networks, has attracted increasing attention for various clustering applications. Deep embedded clustering (DEC) is one of the state-of-the-art deep clustering methods. However, DEC does not make use of prior knowledge to guide the learning process. In this paper, we propose a new scheme of semi-supervised deep embedded clustering (SDEC) to overcome this limitation. Concretely, SDEC learns feature representations that favor the clustering tasks and performs clustering assignments simultaneously. In contrast to DEC, SDEC incorporates pairwise constraints in the feature learning process such that data samples belonging to the same cluster are close to each other and data samples belonging to different clusters are far away from each other in the learned feature space. Extensive experiments on real benchmark data sets validate the effectiveness and robustness of the proposed method.

[1]  Shiri Gordon,et al.  Unsupervised image-set clustering using an information theoretic framework , 2006, IEEE Transactions on Image Processing.

[2]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[3]  Lingfeng Wang,et al.  Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Zenglin Xu,et al.  Semi-supervised DenPeak Clustering with Pairwise Constraints , 2018, PRICAI.

[5]  Zenglin Xu,et al.  Scalable Nonparametric Multiway Data Analysis , 2015, AISTATS.

[6]  Xuan Li,et al.  Local and global structure preserving based feature selection , 2012, Neurocomputing.

[7]  Zenglin Xu,et al.  Efficient Convex Relaxation for Transductive Support Vector Machine , 2007, NIPS.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[10]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[11]  Carlotta Domeniconi,et al.  A Weighted Adaptive Mean Shift Clustering Algorithm , 2014, SDM.

[12]  Guoji Zhang,et al.  Random subspace based semi-supervised feature selection , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[13]  Feng Liu,et al.  Auto-encoder Based Data Clustering , 2013, CIARP.

[14]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[15]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[16]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[17]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Trans. Knowl. Data Eng..

[18]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[19]  Zenglin Xu,et al.  Low-rank kernel learning for graph-based clustering , 2019, Knowl. Based Syst..

[20]  Zenglin Xu,et al.  Structured Inference for Recurrent Hidden Semi-markov Model , 2018, IJCAI.

[21]  Uday Kamath,et al.  Boosted Mean Shift Clustering , 2014, ECML/PKDD.

[22]  Yixin Chen,et al.  CLUE: cluster-based retrieval of images by unsupervised learning , 2005, IEEE Transactions on Image Processing.

[23]  Jane You,et al.  Adaptive Ensembling of Semi-Supervised Clustering Solutions , 2017, IEEE Transactions on Knowledge and Data Engineering.

[24]  Zenglin Xu,et al.  Adaptive Regularization for Transductive Support Vector Machine , 2009, NIPS.

[25]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[27]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[28]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[29]  Gang Chen,et al.  Deep Learning with Nonparametric Clustering , 2015, ArXiv.

[30]  Carlotta Domeniconi,et al.  Weighted-object ensemble clustering: methods and analysis , 2016, Knowledge and Information Systems.

[31]  Zenglin Xu,et al.  Robust multi-view data clustering with multi-view capped-norm K-means , 2018, Neurocomputing.

[32]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[33]  Zhao Kang,et al.  Kernel-driven similarity learning , 2017, Neurocomputing.

[34]  Xia Chen,et al.  Semi-supervised Multi-label Linear Discriminant Analysis , 2017, ICONIP.

[35]  Zenglin Xu,et al.  Variational Random Function Model for Network Modeling , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Zenglin Xu,et al.  Self-weighted multi-view clustering with soft capped norm , 2018, Knowl. Based Syst..

[37]  Jane You,et al.  Distribution-Based Cluster Structure Selection , 2017, IEEE Transactions on Cybernetics.

[38]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[39]  Jane You,et al.  Semi-Supervised Ensemble Clustering Based on Selected Constraint Projection , 2018, IEEE Transactions on Knowledge and Data Engineering.

[40]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[41]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[42]  Zenglin Xu,et al.  Robust graph regularized nonnegative matrix factorization for clustering , 2017, Data Mining and Knowledge Discovery.

[43]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[44]  Zenglin Xu,et al.  Semi-supervised Learning from General Unlabeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[45]  Raquel Urtasun,et al.  Deep Spectral Clustering Learning , 2017, ICML.

[46]  Zenglin Xu,et al.  Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Wei-Yun Yau,et al.  Deep Subspace Clustering with Sparsity Prior , 2016, IJCAI.

[49]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[50]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[51]  Ming Shao,et al.  Deep Linear Coding for Fast Graph Clustering , 2015, IJCAI.

[52]  Gang Chen,et al.  Deep Transductive Semi-supervised Maximum Margin Clustering , 2015, ArXiv.

[53]  Pengtao Xie,et al.  Integrating Image Clustering and Codebook Learning , 2015, AAAI.

[54]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[55]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[56]  Zenglin Xu,et al.  Adaptive local structure learning for document co-clustering , 2018, Knowl. Based Syst..

[57]  Sungzoon Cho,et al.  Bag-of-concepts: Comprehending document representation through clustering words in distributed representation , 2017, Neurocomputing.

[58]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[59]  Zenglin Xu,et al.  Bayesian Nonparametric Models for Multiway Data Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Sugato Basu Semi-supervised Clustering: Learning with Limited User Feedback , 2004 .

[61]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[64]  Yazhou Ren Big data clustering and its applications in regional science , 2017 .

[65]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[66]  Carlotta Domeniconi,et al.  Weighted-Object Ensemble Clustering , 2013, 2013 IEEE 13th International Conference on Data Mining.

[67]  Jane You,et al.  Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[68]  Ayhan Demiriz,et al.  Constrained K-Means Clustering , 2000 .

[69]  James Ze Wang,et al.  Real-time computerized annotation of pictures. , 2008, IEEE transactions on pattern analysis and machine intelligence.