Deep Attentive Multimodal Network Representation Learning for Social Media Images

The analysis for social networks, such as the socially connected Internet of Things, has shown a deep influence of intelligent information processing technology on industrial systems for Smart Cities. The goal of social media representation learning is to learn dense, low-dimensional, and continuous representations for multimodal data within social networks, facilitating many real-world applications. Since social media images are usually accompanied by rich metadata (e.g., textual descriptions, tags, groups, and submitted users), simply modeling the image is not effective to learn the comprehensive information from social media images. In this work, we treat the image and its textual description as multimodal content, and transform other metainformation into the links between contents (such as two images marked by the same tag or submitted by the same user). Based on the multimodal content and social links, we propose a Deep Attentive Multimodal Graph Embedding model named DAMGE for more effective social image representation learning. We introduce both small- and large-scale datasets to conduct extensive experiments, of which the results confirm the superiority of the proposal on the tasks of social image classification and link prediction.

[1]  Feiran Huang,et al.  Multimodal Learning of Social Image Representation by Exploiting Social Relations , 2019, IEEE Transactions on Cybernetics.

[2]  Zhoujun Li,et al.  From content to links: Social image embedding with deep multimodal model , 2018, Knowl. Based Syst..

[3]  Feiran Huang,et al.  Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder , 2018, ICMR.

[4]  Xiangnan He,et al.  Attributed Social Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[6]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[7]  Yue Gao,et al.  Filtering of Brand-Related Microblogs Using Social-Smooth Multiview Embedding , 2016, IEEE Transactions on Multimedia.

[8]  Lei Chen,et al.  Progressive Batch Medical Image Retrieval Processing in Mobile Wireless Networks , 2015, TOIT.

[9]  Zhoujun Li,et al.  Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching , 2019, IEEE Transactions on Image Processing.

[10]  Zhoujun Li,et al.  Multi-Hot Compact Network Embedding , 2019, CIKM.

[11]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[12]  Tao Mei,et al.  Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Shaowei Liu,et al.  General Knowledge Embedded Image Representation Learning , 2018, IEEE Transactions on Multimedia.

[14]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[15]  Kai Zhang,et al.  Extreme learning machine and adaptive sparse representation for image classification , 2016, Neural Networks.

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[18]  Wenwu Zhu,et al.  Learning Socially Embedded Visual Representation from Scratch , 2015, ACM Multimedia.

[19]  Feiran Huang,et al.  Network embedding by fusing multimodal contents and links , 2019, Knowl. Based Syst..

[20]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[21]  Feiran Huang,et al.  Adversarial Learning of Answer-Related Representation for Visual Question Answering , 2018, CIKM.

[22]  Feiran Huang,et al.  Deep multi-view representation learning for social images , 2018, Appl. Soft Comput..

[23]  Tao Mei,et al.  Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Qing Li,et al.  Dual Structure Constrained Multimodal Feature Coding for Social Event Detection from Flickr Data , 2017, ACM Trans. Internet Techn..