DM2C: Deep Mixed-Modal Clustering

Data exhibited with multiple modalities are ubiquitous in real-world clustering tasks. Most existing methods, however, pose a strong assumption that the pairing information for modalities is available for all instances. In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data. Without any extra pairing supervision across modalities, it is difficult to find a universal semantic space for all of them. To tackle this problem, we present an adversarial learning framework for clustering with mixed-modal data. Instead of transforming all the samples into a joint modality-independent space, our framework learns the mappings across individual modal spaces by virtue of cycle-consistency. Through these mappings, we could easily unify all the samples into a single modal space and perform the clustering. Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework.

[1]  Tao Xiang,et al.  Scalable and Effective Deep CCA via Soft Decorrelation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[3]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[4]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[5]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lin Yang,et al.  Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Wei Zhang,et al.  Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[8]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[9]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Feiping Nie,et al.  Heterogeneous image feature integration via multi-modal spectral clustering , 2011, CVPR 2011.

[12]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[13]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[14]  Huazhu Fu,et al.  AE2-Nets: Autoencoder in Autoencoder Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xuelong Li,et al.  Multi-view Subspace Clustering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Hong Yu,et al.  Multi-view clustering via multi-manifold regularized non-negative matrix factorization , 2017, Neural Networks.

[17]  Ling Shao,et al.  Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[20]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[21]  Hong Yu,et al.  Constrained NMF-Based Multi-View Clustering on Unmapped Data , 2015, AAAI.

[22]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jiebo Luo,et al.  Towards Perceptual Image Dehazing by Physics-Based Disentanglement and Adversarial Training , 2018, AAAI.

[24]  Christian Theobalt,et al.  GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[26]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[27]  Hong Yu,et al.  Weighted Multi-View Spectral Clustering Based on Spectral Perturbation , 2018, AAAI.

[28]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[29]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[30]  Dong Xu,et al.  Semi-Supervised Heterogeneous Fusion for Multimedia Data Co-Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[31]  Zhaoxiang Zhang,et al.  CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation , 2017, AAAI.

[32]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[33]  Qingming Huang,et al.  When to Learn What: Deep Cognitive Subspace Clustering , 2018, ACM Multimedia.

[34]  Qingming Huang,et al.  Split Multiplicative Multi-View Subspace Clustering , 2019, IEEE Transactions on Image Processing.

[35]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[36]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[37]  Éric Gaussier,et al.  Deep k-Means: Jointly Clustering with k-Means and Learning Representations , 2018, Pattern Recognit. Lett..

[38]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[39]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[40]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[41]  H. Kuhn The Hungarian method for the assignment problem , 1955 .