End-to-End Adversarial-Attention Network for Multi-Modal Clustering

Multi-modal clustering aims to cluster data into different groups by exploring complementary information from multiple modalities or views. Little work learns the deep fused representations and simutaneously discovers the cluster structure with a discriminative loss. In this paper, we present an End-to-end Adversarial-attention network for Multi-modal Clustering (EAMC), where adversarial learning and attention mechanism are leveraged to align the latent feature distributions and quantify the importance of modalities respectively. To benefit from the joint training, we introducea divergence-based clustering objective that not only encourages the separation and compactness of the clusters but also enjoy a clear cluster structure by embedding the simplex geometry of the output space into the loss. The proposed network consists of modality-specific feature learning, modality fusion and cluster assignment three modules. It can be trained from scratch with batch-mode based optimization and avoid an autoencoder pretraining stage. Comprehensive experiments conducted on five real-world datasets show the superiority and effectiveness of the proposed clustering method.

[1]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Guillermo Sapiro,et al.  Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? , 2015, IEEE Transactions on Signal Processing.

[4]  Lei Shi,et al.  Recovery of Corrupted Multiple Kernels for Clustering , 2015, IJCAI.

[5]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[6]  Michael Kampffmeyer,et al.  Deep divergence-based clustering , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Lei Wang,et al.  Multiple Kernel k-Means Clustering with Matrix-Induced Regularization , 2016, AAAI.

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[11]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[12]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[13]  Feiping Nie,et al.  Discriminatively Embedded K-Means for Multi-view Clustering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Rong Wang,et al.  Parameter-Free Weighted Multi-View Projected Clustering with Structured Graph Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Lei Wang,et al.  Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization , 2023, IEEE Transactions on Multimedia.

[17]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[18]  Hamid R. Rabiee,et al.  MDL-CW: A Multimodal Deep Learning Framework with CrossWeights , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yun Fu,et al.  Marginalized Multiview Ensemble Clustering , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Lei Shi,et al.  Robust Multiple Kernel K-means Using L21-Norm , 2015, IJCAI.

[21]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[22]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[23]  Zhaoyang Li,et al.  Deep Adversarial Multi-view Clustering Network , 2019, IJCAI.

[24]  Feiping Nie,et al.  Multiview Consensus Graph Clustering , 2019, IEEE Transactions on Image Processing.

[25]  Qingming Huang,et al.  Split Multiplicative Multi-View Subspace Clustering , 2019, IEEE Transactions on Image Processing.

[26]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[27]  Mehmet Gönen,et al.  Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology , 2014, NIPS.

[28]  Xuelong Li,et al.  Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours , 2017, AAAI.

[29]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Vishal M. Patel,et al.  Deep Multimodal Subspace Clustering Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[31]  Tao Zhou,et al.  Dual Shared-Specific Multiview Subspace Clustering , 2020, IEEE Transactions on Cybernetics.

[32]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Robert Jenssen,et al.  The Cauchy-Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels , 2006, J. Frankl. Inst..

[35]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[36]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[37]  Junbin Gao,et al.  Multiview Subspace Clustering via Tensorial t-Product Representation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Xuelong Li,et al.  Self-weighted Multiview Clustering with Multiple Graphs , 2017, IJCAI.

[39]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[40]  Wei Zhang,et al.  Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[41]  Hong Yu,et al.  Weighted Multi-View Spectral Clustering Based on Spectral Perturbation , 2018, AAAI.

[42]  Hongchuan Yu,et al.  Diverse Non-Negative Matrix Factorization for Multiview Data Representation , 2018, IEEE Transactions on Cybernetics.

[43]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[44]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Robert Jenssen,et al.  Information theoretic clustering using a k-nearest neighbors approach , 2014, Pattern Recognit..

[46]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yong Dou,et al.  Multiple kernel learning with hybrid kernel alignment maximization , 2017, Pattern Recognit..