Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding

Multimodal dialog system has attracted increasing attention from both academia and industry over recent years. Although existing methods have achieved some progress, they are still confronted with challenges in the aspect of question understanding (i.e., user intention comprehension). In this paper, we present a relational graph-based context-aware question understanding scheme, which enhances the user intention comprehension from local to global. Specifically, we first utilize multiple attribute matrices as the guidance information to fully exploit the product-related keywords from each textual sentence, strengthening the local representation of user intentions. Afterwards, we design a sparse graph attention network to adaptively aggregate effective context information for each utterance, completely understanding the user intentions from a global perspective. Moreover, extensive experiments over a benchmark dataset show the superiority of our model compared with several state-of-the-art baselines.

[1]  Arash Einolghozati,et al.  Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog , 2019, AAAI.

[2]  Mitesh M. Khapra,et al.  Towards Building Large Scale Multimodal Domain-Aware Conversation Systems , 2017, AAAI.

[3]  Qi Tian,et al.  Cross-modal Moment Localization in Videos , 2018, ACM Multimedia.

[4]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[5]  Rui Yan,et al.  Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System , 2016, SIGIR.

[6]  Tat-Seng Chua,et al.  Knowledge-aware Multimodal Dialogue Systems , 2018, ACM Multimedia.

[7]  Kee-Eung Kim,et al.  End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2 , 2020, ACL.

[8]  Chen Cui,et al.  User Attention-guided Multimodal Dialog Systems , 2019, SIGIR.

[9]  Fumin Shen,et al.  Chat More: Deepening and Widening the Chatting Topic via A Deep Model , 2018, SIGIR.

[10]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[11]  Jianfeng Gao,et al.  Challenges in Building Intelligent Open-domain Dialog Systems , 2019, ACM Trans. Inf. Syst..

[12]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[13]  Wei-Ying Ma,et al.  Topic Aware Neural Response Generation , 2016, AAAI.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Deng Cai,et al.  Dialogue Act Recognition via CRF-Attentive Structured Network , 2017, SIGIR.

[17]  Xiangnan He,et al.  MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video , 2019, ACM Multimedia.

[18]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[19]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[20]  Dongyan Zhao,et al.  Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection , 2020, SIGIR.

[21]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[22]  Zan Gao,et al.  Pairwise View Weighted Graph Network for View-based 3D Model Retrieval , 2020, SIGIR.

[23]  Jianfeng Liu,et al.  GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning , 2020, SIGIR.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Jun Huang,et al.  Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems , 2018, SIGIR.

[26]  Zhijian Ou,et al.  Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , 2019, AAAI.

[27]  Qi Tian,et al.  Multimodal Dialog System: Generating Responses via Adaptive Decoders , 2019, ACM Multimedia.

[28]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[29]  Pushpak Bhattacharyya,et al.  Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System , 2019, ACL.

[30]  Dongyan Zhao,et al.  Joint Learning of Response Ranking and Next Utterance Suggestion in Human-Computer Conversation System , 2017, SIGIR.

[31]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jian Wang,et al.  Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems , 2020, COLING.

[34]  Jun Ma,et al.  NeuroStylist: Neural Compatibility Modeling for Clothing Matching , 2017, ACM Multimedia.

[35]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[36]  Chong-Wah Ngo,et al.  Interpretable Multimodal Retrieval for Fashion Products , 2018, ACM Multimedia.

[37]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[38]  Liqiang Nie,et al.  Neural Multimodal Cooperative Learning Toward Micro-Video Understanding , 2020, IEEE Transactions on Image Processing.

[39]  Enhong Chen,et al.  Multimodal Dialogue Systems via Capturing Context-aware Dependencies of Semantic Elements , 2020, ACM Multimedia.

[40]  Meng Liu,et al.  Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning , 2019, IEEE Transactions on Image Processing.

[41]  Hongxia Yang,et al.  Towards Knowledge-Based Recommender Dialog System , 2019, EMNLP.

[42]  Yi Zhang,et al.  Conversational Recommender System , 2018, SIGIR.

[43]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.