When E-commerce Meets Social Media: Identifying Business on WeChat Moment Using Bilateral-Attention LSTM

WeChat Business, developed on WeChat, the most extensively used instant messaging platform in China, is a new business model that bursts into people's lives in the e-commerce era. As one of the most typical WeChat Business behaviors, WeChat users can advertise products, advocate companies and share customer feedback to their WeChat friends by posting a WeChat Moment--a public status that contains images and a text. Given its popularity and significance, in this paper, we propose a novel Bilateral-Attention LSTM network (BiATT-LSTM) to identify WeChat Business Moments based on their texts and images. In particular, different from previous schemes that equally consider visual and textual modalities for a joint visual-textual classification task, we start our work with a text classification task based on an LSTM network, then we incorporate a bilateral-attention mechanism that can automatically learn two kinds of explicit attention weights for each word, namely 1) a global weight that is insensitive to the images in the same Moment with the word, and 2) a local weight that is sensitive to the images in the same Moment. In this process, we utilize visual information as a guidance to figure out the local weight of a word in a specific Moment. Two-level experiments demonstrate the effectiveness of our framework. It outperforms other schemes that jointly model visual and textual modalities. We also visualize the bilateral-attention mechanism to illustrate how this mechanism helps joint visual-textual classification.

[1]  Christos Faloutsos,et al.  Beyond Sigmoids: The NetTide Model for Social Network Growth, and Its Applications , 2016, KDD.

[2]  Heng Tao Shen,et al.  Attention-based LSTM with Semantic Consistency for Videos Captioning , 2016, ACM Multimedia.

[3]  Xiaogang Wang,et al.  Person Search with Natural Language Description , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[6]  Jiebo Luo,et al.  Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks , 2016, ACM Multimedia.

[7]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[8]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[9]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[10]  Kaigui Bian,et al.  On diffusion-restricted social network: A measurement study of WeChat moments , 2016, 2016 IEEE International Conference on Communications (ICC).

[11]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Zhongfei Zhang,et al.  DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks , 2016, KDD.

[14]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Saurabh Singh,et al.  Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[17]  George R. Milne,et al.  Should tweets differ for B2B and B2C? An analysis of Fortune 500 companies' Twitter communications , 2014 .

[18]  Yang Wang,et al.  Space Collapse: Reinforcing, Reconfiguring and Enhancing Chinese Social Practices through WeChat , 2016, ICWSM.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[21]  Qiang Yang,et al.  The Lifecycle and Cascade of WeChat Social Messaging Groups , 2015, WWW.

[22]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.