Self-attention Multi-view Representation Learning with Diversity-promoting Complementarity

Multi-view learning attempts to generate a model with a better performance by exploiting the consensus and/or complementarity among multi-view data. However, in terms of complementarity, most existing approaches only can find representations with single complementarity rather than complementary information with diversity. In this paper, to utilize both complementarity and consistency simultaneously, give free rein to the potential of deep learning in grasping diversity-promoting complementarity for multi-view representation learning, we propose a novel supervised multi-view representation learning algorithm, called Self-Attention Multi-View network with Diversity-Promoting Complementarity (SAMVDPC), which exploits the consistency by a group of encoders, uses self-attention to find complementary information entailing diversity. Extensive experiments conducted on eight real-world datasets have demonstrated the effectiveness of our proposed method, and show its superiority over several baseline methods, which only consider single complementary information.

[1]  Jianping Fan,et al.  Multi-View Concept Learning for Data Representation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[2]  Zheng-Jun Zha,et al.  Difficulty guided image retrieval using linear multiview embedding , 2011, ACM Multimedia.

[3]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[5]  Rong Pan,et al.  Bi-Weighting Domain Adaptation for Cross-Language Text Classification , 2011, IJCAI.

[6]  Ming Li,et al.  Feature extraction via multi-view non-negative matrix factorization with local graph regularization , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[7]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[8]  Ankita Kumar,et al.  Support Kernel Machines for Object Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[10]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[11]  Zhi-Hua Zhou,et al.  CoTrade: Confident Co-Training With Data Editing , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Kaizhu Huang,et al.  m-SNE: Multiview Stochastic Neighbor Embedding , 2011, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[17]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[18]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Hao Wang,et al.  Multi-view Clustering via Concept Factorization with Local Manifold Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[21]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[22]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[23]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[24]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[25]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[26]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[28]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[29]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[30]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[31]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.