论文信息 - Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems - 字舞流文

Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems

Self-attentional models are a new paradigm for sequence modelling tasks which differ from common sequence modelling methods, such as recurrence-based and convolution-based sequence learning, in the way that their architecture is only based on the attention mechanism. Self-attentional models have been used in the creation of the state-of-the-art models in many NLP tasks such as neural machine translation, but their usage has not been explored for the task of training end-to-end task-oriented dialogue generation systems yet. In this study, we apply these models on the three different datasets for training task-oriented chatbots. Our finding shows that self-attentional models can be exploited to create end-to-end task-oriented chatbots which not only achieve higher evaluation scores compared to recurrence-based models, but also do so more efficiently.

Osmar R. Zaïane | Amine Trabelsi | Mansour Saffar Mehrjardi | Osmar R Zaiane | Amine Trabelsi

[1] Antoine Raux,et al. The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[2] Hang Li,et al. “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[3] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[7] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[10] Christopher D. Manning,et al. Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[11] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[12] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[13] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[14] Yiming Yang,et al. Transformer-XL: Language Modeling with Longer-Term Dependency , 2018 .

[15] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.

[16] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[17] Jiliang Tang,et al. A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[18] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[20] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[21] Jason Weston,et al. Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[22] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[24] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.

[26] Yi Pan,et al. Conversational AI: The Science Behind the Alexa Prize , 2018, ArXiv.

[27] Gökhan Tür,et al. Building a Conversational Agent Overnight with Dialogue Self-Play , 2018, ArXiv.

[28] David Vandyke,et al. A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[29] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[30] Geoffrey Zweig,et al. End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning , 2016, ArXiv.

[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32] Xiang Zhang,et al. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.

[33] Geoffrey Zweig,et al. Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[34] Jason Weston,et al. Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[35] Markus Freitag,et al. Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[36] Haizhou Li,et al. Evaluating and Combining Name Entity Recognition Systems , 2016, NEWS@ACM.

[37] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[38] Christopher D. Manning,et al. A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue , 2017, EACL.