Multi-modal Attentive Graph Pooling Model for Community Question Answer Matching

Nowadays, millions of users use community question answering (CQA) systems to share valuable knowledge. An essential function of CQA systems is the accurate matching of answers w.r.t a given question. Recent research exhibits the superior advantages of graph neural networks (GNNs) on modeling content semantics for CQA matching. However, existing GNN-based approaches are insufficient to deal with the multi-modal and redundant properties of CQA systems. In this paper, we propose a multi-modal attentive graph pooling approach (MMAGP) to model the multi-modal content of questions and answers with GNNs in a unified framework, which explores the multi-modal and redundant properties of CQA systems. Our model converts each question/answer into a multi-modal content graph, which can preserve the relational information within multi-modal content. Specifically, to exploit the visual information, we propose an unsupervised meta-path link prediction approach to extract labels from visual content and model them into the multi-modal graph. An attentive graph pooling network is proposed to select vertices in the multi-modal content graph that are significant for the matching adaptively, and generate a pooled graph via aggregating context information for selected vertices. An interaction pooling network is designed to infer the final matching score based on the interactions between the pooled graphs of the input question and answer. Experimental results on two real-world datasets demonstrate the superior performance of MMAGP compared with other state-of-the-art CQA matching models.

[1]  Iryna Gurevych,et al.  Representation Learning for Answer Selection with LSTM-Based Importance Weighting , 2017, IWCS.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Meng Wang,et al.  Multimedia Question Answering , 2010, IEEE MultiMedia.

[4]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Houfeng Wang,et al.  Attentive Interactive Neural Networks for Answer Selection in Community Question Answering , 2017, AAAI.

[7]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[8]  Fang Liu,et al.  Improving Question Retrieval in Community Question Answering Using World Knowledge , 2013, IJCAI.

[9]  Jun Hu,et al.  Hierarchical Graph Semantic Pooling Network for Multi-modal Community Question Answer Matching , 2019, ACM Multimedia.

[10]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[11]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[14]  Changsheng Xu,et al.  Attentive Interactive Convolutional Matching for Community Question Answering in Social Multimedia , 2018, ACM Multimedia.

[15]  Yingying Zhang,et al.  Multi-modal Knowledge-aware Hierarchical Attention Network for Explainable Medical Question Answering , 2019, ACM Multimedia.

[16]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[17]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[18]  Zhiwei Sun,et al.  Question/Answer Matching for CQA System via Combining Lexical and Sequential Information , 2015, AAAI.

[19]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[20]  Yue Gao,et al.  Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information , 2013, IEEE Transactions on Multimedia.

[21]  Yueting Zhuang,et al.  Community-Based Question Answering via Heterogeneous Social Network Learning , 2016, AAAI.

[22]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[23]  Tat-Seng Chua,et al.  From text question-answering to multimedia QA on web-scale media resources , 2009, LS-MMRM '09.

[24]  M. Shamim Hossain,et al.  Word-of-Mouth Understanding: Entity-Centric Multimodal Aspect-Opinion Mining in Social Media , 2015, IEEE Transactions on Multimedia.

[25]  Changsheng Xu,et al.  Multi-Modal Event Topic Model for Social Event Analysis , 2016, IEEE Transactions on Multimedia.

[26]  Bowen Zhou,et al.  Improved Representation Learning for Question Answer Matching , 2016, ACL.

[27]  Laura Graesser,et al.  Natural Language Understanding with the Quora Question Pairs Dataset , 2019, ArXiv.

[28]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[29]  Jun Hu,et al.  A2CMHNE , 2019, ACM Transactions on Multimedia Computing, Communications, and Applications.

[30]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[31]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[32]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[33]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  M. Shamim Hossain,et al.  Folksonomy-Based Visual Ontology Construction and Its Applications , 2016, IEEE Transactions on Multimedia.

[36]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[37]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[38]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[39]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[40]  Changsheng Xu,et al.  Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis , 2016, ACM Multimedia.