Reconsidering Representation Alignment for Multi-view Clustering

Aligning distributions of view representations is a core component of today’s state of the art models for deep multi-view clustering. However, we identify several drawbacks with naïvely aligning representation distributions. We demonstrate that these drawbacks both lead to less separable clusters in the representation space, and inhibit the model’s ability to prioritize views. Based on these observations, we develop a simple baseline model for deep multi-view clustering. Our baseline model avoids representation alignment altogether, while performing similar to, or better than, the current state of the art. We also expand our baseline model by adding a contrastive learning component. This introduces a selective alignment procedure that preserves the model’s ability to prioritize views. Our experiments show that the contrastive learning component enhances the baseline model, improving on the current state of the art by a large margin on several datasets1.

[1]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Sanja Fidler,et al.  What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Kristen Grauman,et al.  Accounting for the Relative Importance of Objects in Image Retrieval , 2010, BMVC.

[5]  Sindy Löwe,et al.  Putting An End to End-to-End: Gradient-Isolated Learning of Representations , 2019, NeurIPS.

[6]  Michael Kampffmeyer,et al.  Deep Divergence-Based Approach to Clustering , 2019, Neural Networks.

[7]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[8]  Yun Fu,et al.  Marginalized Multiview Ensemble Clustering , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Vishal M. Patel,et al.  Deep Multimodal Subspace Clustering Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Shengli Xie,et al.  Deep graph regularized non-negative matrix factorization for multi-view clustering , 2020, Neurocomputing.

[12]  Xinbo Gao,et al.  Multiview Clustering by Joint Latent Representation and Similarity Learning , 2020, IEEE Transactions on Cybernetics.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Wei Zhang,et al.  Consistent and Specific Multi-View Subspace Clustering , 2018, AAAI.

[18]  Feiping Nie,et al.  Multiview Consensus Graph Clustering , 2019, IEEE Transactions on Image Processing.

[19]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[20]  Pietro Liò,et al.  XFlow: Cross-Modal Deep Neural Networks for Audiovisual Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[22]  Yifan Wu,et al.  Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment , 2019, ICML.

[23]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[24]  Michael K. Ng,et al.  Tensor-Based Low-Dimensional Representation Learning for Multi-View Clustering , 2019, IEEE Transactions on Image Processing.

[25]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[26]  Mohammad Norouzi,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[27]  Zenglin Xu,et al.  Auto-weighted multi-view clustering via deep matrix decomposition , 2020, Pattern Recognit..

[28]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Mianxiong Dong,et al.  MultiSpectralNet: Spectral Clustering Using Deep Neural Network for Multi-View Data , 2019, IEEE Transactions on Computational Social Systems.

[30]  Frédéric Jurie,et al.  MFAS: Multimodal Fusion Architecture Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[32]  Tiesong Zhao,et al.  Multi-View Data Fusion Oriented Clustering via Nuclear Norm Minimization , 2020, IEEE Transactions on Image Processing.

[33]  Yi-Dong Shen,et al.  End-to-End Adversarial-Attention Network for Multi-Modal Clustering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junbin Gao,et al.  Multiview Subspace Clustering via Tensorial t-Product Representation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Lei Shi,et al.  Robust Multiple Kernel K-means Using L21-Norm , 2015, IJCAI.

[36]  Lei Wang,et al.  Multiple Kernel k-Means Clustering with Matrix-Induced Regularization , 2016, AAAI.

[37]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[38]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[39]  Robert Jenssen,et al.  The Cauchy-Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels , 2006, J. Frankl. Inst..

[40]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[41]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[42]  Mehmet Gönen,et al.  Localized Data Fusion for Kernel k-Means Clustering with Application to Cancer Biology , 2014, NIPS.

[43]  Rong Wang,et al.  Parameter-Free Weighted Multi-View Projected Clustering with Structured Graph Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.

[44]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[45]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Qinghua Hu,et al.  Generalized Latent Multi-View Subspace Clustering , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[48]  Qingming Huang,et al.  Split Multiplicative Multi-View Subspace Clustering , 2019, IEEE Transactions on Image Processing.

[49]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[50]  Dacheng Tao,et al.  Multi-view Self-Paced Learning for Clustering , 2015, IJCAI.

[51]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[53]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[54]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[55]  Zhaoyang Li,et al.  Deep Adversarial Multi-view Clustering Network , 2019, IJCAI.

[56]  Hong Yu,et al.  Weighted Multi-View Spectral Clustering Based on Spectral Perturbation , 2018, AAAI.

[57]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[58]  Zhihui Li,et al.  Robust Self-Weighted Multi-View Projection Clustering , 2020, AAAI.

[59]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[60]  Jianhai Zhang,et al.  Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling , 2019, NeurIPS.

[61]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  K. L. Ponce-Guevara,et al.  Self-organizing subspace clustering for high-dimensional and multi-view data , 2020, Neural Networks.

[63]  Qianqian Wang,et al.  Multi-View Attribute Graph Convolution Networks for Clustering , 2020, IJCAI.

[64]  Lei Wang,et al.  Late Fusion Multiple Kernel Clustering With Local Kernel Alignment Maximization , 2023, IEEE Transactions on Multimedia.

[65]  Xuelong Li,et al.  Self-weighted Multiview Clustering with Multiple Graphs , 2017, IJCAI.