论文信息 - DM2C: Deep Mixed-Modal Clustering

DM2C: Deep Mixed-Modal Clustering

Data exhibited with multiple modalities are ubiquitous in real-world clustering tasks. Most existing methods, however, pose a strong assumption that the pairing information for modalities is available for all instances. In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data. Without any extra pairing supervision across modalities, it is difficult to find a universal semantic space for all of them. To tackle this problem, we present an adversarial learning framework for clustering with mixed-modal data. Instead of transforming all the samples into a joint modality-independent space, our framework learns the mappings across individual modal spaces by virtue of cycle-consistency. Through these mappings, we could easily unify all the samples into a single modal space and perform the clustering. Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework.

[1] Tao Xiang,et al. Scalable and Effective Deep CCA via Soft Decorrelation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[3] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[4] Gabriel Peyré,et al. Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[5] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Lin Yang,et al. Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Wei Zhang,et al. Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[8] Jianping Yin,et al. Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[9] Lior Wolf,et al. Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[10] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11] Feiping Nie,et al. Heterogeneous image feature integration via multi-modal spectral clustering , 2011, CVPR 2011.

[12] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[13] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[14] Huazhu Fu,et al. AE2-Nets: Autoencoder in Autoencoder Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Xuelong Li,et al. Multi-view Subspace Clustering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Hong Yu,et al. Multi-view clustering via multi-manifold regularized non-negative matrix factorization , 2017, Neural Networks.

[17] Ling Shao,et al. Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[18] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[19] Bo Yang,et al. Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[20] Tao Qin,et al. Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[21] Hong Yu,et al. Constrained NMF-Based Multi-View Clustering on Unmapped Data , 2015, AAAI.

[22] Roger Levy,et al. On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Jiebo Luo,et al. Towards Perceptual Image Dehazing by Physics-Based Disentanglement and Adversarial Training , 2018, AAAI.

[24] Christian Theobalt,et al. GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Shao-Yuan Li,et al. Partial Multi-View Clustering , 2014, AAAI.

[26] Jiawei Han,et al. Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[27] Hong Yu,et al. Weighted Multi-View Spectral Clustering Based on Spectral Perturbation , 2018, AAAI.

[28] Julio Gonzalo,et al. A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[29] Taesung Park,et al. CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[30] Dong Xu,et al. Semi-Supervised Heterogeneous Fusion for Multimedia Data Co-Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31] Zhaoxiang Zhang,et al. CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation , 2017, AAAI.

[32] Philip Bachman,et al. Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[33] Qingming Huang,et al. When to Learn What: Deep Cognitive Subspace Clustering , 2018, ACM Multimedia.

[34] Qingming Huang,et al. Split Multiplicative Multi-View Subspace Clustering , 2019, IEEE Transactions on Image Processing.

[35] Masashi Sugiyama,et al. Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[36] Sham M. Kakade,et al. Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[37] Éric Gaussier,et al. Deep k-Means: Jointly Clustering with k-Means and Learning Representations , 2018, Pattern Recognit. Lett..

[38] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[39] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[40] Christopher J. C. Burges,et al. Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[41] H. Kuhn. The Hungarian method for the assignment problem , 1955 .