User-guided Hierarchical Attention Network for Multi-modal Social Image Popularity Prediction

Popularity prediction for the growing social images has opened unprecedented opportunities for wide commercial applications, such as precision advertising and recommender system. While a few studies have explored this significant task, little research has addressed its unstructured properties of both visual and textual modalities, and further considered to learn effective representation from multi-modalities for popularity prediction. To this end, we propose a model named User-guided Hierarchical Attention Network (UHAN) with two novel user-guided attention mechanisms to hierarchically attend both visual and textual modalities. It is capable of not only learning effective representation for each modality, but also fusing them to obtain an integrated multi-modal representation under the guidance of user embedding. As no benchmark dataset exists, we extend a publicly available social image dataset by adding the descriptions of images. The comprehensive experiments have demonstrated the rationality of our proposed UHAN and its better performance than several strong alternatives.

[1]  Paul A. Viola,et al.  Multi-modal volume registration by maximization of mutual information , 1996, Medical Image Anal..

[2]  Shaowen Wang,et al.  Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning , 2017, WWW.

[3]  Alberto Del Bimbo,et al.  Image Popularity Prediction in Social Media Using Sentiment and Context Features , 2015, ACM Multimedia.

[4]  Wenwu Zhu,et al.  Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.

[5]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[6]  Jung-Woo Ha,et al.  Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiasen Lu,et al.  VQA: Visual Question Answering , 2015, ICCV.

[8]  Yongdong Zhang,et al.  Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks , 2017, IJCAI.

[9]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[10]  Yongdong Zhang,et al.  Time Matters: Multi-scale Temporalization of Social Media Popularity , 2016, ACM Multimedia.

[11]  Wei Xu,et al.  ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tat-Seng Chua,et al.  Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model , 2016, ACM Multimedia.

[18]  Kamelia Aryafar,et al.  Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank , 2015, KDD.

[19]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Hui Xiong,et al.  Predicting the Popularity of Online Serials with Autoregressive Models , 2014, CIKM.

[22]  Himabindu Lakkaraju,et al.  Attention prediction on social media brand pages , 2011, CIKM '11.

[23]  Lifeng Sun,et al.  Who should share what?: item-level social influence prediction for users and posts ranking , 2011, SIGIR.

[24]  Ari Rappoport,et al.  What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities , 2012, WSDM '12.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Duncan J. Watts,et al.  Exploring Limits to Prediction in Complex Social Systems , 2016, WWW.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Gunhee Kim,et al.  Attend to You: Personalized Image Captioning with Context Sequence Memory Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[31]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[32]  Yiqun Liu,et al.  Predicting the popularity of web 2.0 items based on user comments , 2014, SIGIR.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Seungjin Choi,et al.  Deep Learning to Hash with Multiple Representations , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Changsheng Li,et al.  On Modeling and Predicting Individual Paper Citation Count over Time , 2016, IJCAI.

[36]  Louis-Philippe Morency,et al.  Deep multimodal fusion for persuasiveness prediction , 2016, ICMI.

[37]  Oladimeji Farri,et al.  Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks , 2017, WWW.

[38]  Wei Zhang,et al.  Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering , 2017, AAAI.

[39]  Byoung-Tak Zhang,et al.  Multimodal Residual Learning for Visual QA , 2016, NIPS.

[40]  Scott Sanner,et al.  Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity , 2016, WWW.

[41]  Rada Mihalcea,et al.  Text Mining for Automatic Image Tagging , 2010, COLING.

[42]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[43]  Cheng Li,et al.  DeepCas: An End-to-end Predictor of Information Cascades , 2016, WWW.

[44]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.