A Response Retrieval Approach for Dialogue Using a Multi-Attentive Transformer

This paper presents our work for the ninth edition of the Dialogue System Technology Challenge (DSTC9). Our solution addresses the track number four: Simulated Interactive MultiModal Conversations. The task consists in providing an algorithm able to simulate a shopping assistant that supports the user with his/her requests. We address the task of response retrieval, that is the task of retrieving the most appropriate agent response from a pool of response candidates. Our approach makes use of a neural architecture based on transformer with a multi-attentive structure that conditions the response of the agent on the request made by the user and on the product the user is referring to. Final experiments on the SIMMC Fashion Dataset show that our approach achieves the second best scores on all the retrieval metrics defined by the organizers. The source code is available at https://github.com/D2KLab/dstc9-SIMMC.

[1]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[4]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[5]  David Whitney,et al.  Situated and Interactive Multimodal Conversations , 2020, COLING.

[6]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Dilek Z. Hakkani-Tür,et al.  Overview of the Ninth Dialog System Technology Challenge: DSTC9 , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[11]  José M. F. Moura,et al.  Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[12]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[13]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[14]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series , 2014, AI Mag..

[18]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.