Multimodal Clustering via Deep Commonness and Uniqueness Mining

Deep multimodal clustering have shown their competitiveness among different multimodal clustering algorithms. Existing algorithms usually boost the multimodal clustering by exploring the common knowledge among multiple modalities, which underutilizes the uniqueness of multiple modalities. In this paper, we enhance the mining of modality-common knowledge by extracting the modality-unique knowledge of each modality simultaneously. Specifically, we first utilize autoencoders to extract the modality-common and modality-unique features of each modality respectively. Meanwhile, the cross reconstruction is used to build latent connections among different modalities, i.e., maintain the consistency of modality-common features of each modality as well as heightening the diversity of modality-unique features of each modality. After that, modality-common features are fused to cluster the multimodal data. Experimental results on several benchmark datasets demonstrate that the proposed method outperforms state-of-art works obviously.

[1]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[2]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[5]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[6]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[7]  Vishal M. Patel,et al.  Deep Multimodal Subspace Clustering Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[8]  Stan Z. Li,et al.  Exclusivity-Consistency Regularized Multi-view Subspace Clustering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[11]  Jiancheng Lv,et al.  Multi-view Spectral Clustering Network , 2019, IJCAI.

[12]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[13]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.