Overview of the Eighth Dialog System Technology Challenge: DSTC8

This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, baselines and evaluation set-up for each track. We also summarize the results of the submitted systems to highlight the overall trends of the state-of-the-art technologies for the tasks.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[3]  Ali Farhadi,et al.  Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.

[4]  Sungjin Lee,et al.  ConvLab: Multi-Domain End-to-End Dialog System Platform , 2019, ACL.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Matthias Grossglauser,et al.  Just Sort It! A Simple and Effective Approach to Active Preference Learning , 2015, ICML.

[7]  Tim K. Marks,et al.  Audio Visual Scene-aware dialog (AVSD) Track for Natural Language Generation in DSTC7 , 2019 .

[8]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[9]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[10]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[11]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[12]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[13]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[14]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[15]  Tsung-Hsien Wen,et al.  Latent Intention Dialogue Models , 2017, ICML.

[16]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[17]  Peter Hall,et al.  Using the bootstrap to quantify the authority of an empirical ranking , 2009, 0911.3749.

[18]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[19]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[20]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[21]  Chiori Hori,et al.  Overview of the seventh Dialog System Technology Challenge: DSTC7 , 2020, Comput. Speech Lang..

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Jianfeng Gao,et al.  Dialog System Technology Challenge 7 , 2019, ArXiv.

[24]  Dilek Z. Hakkani-Tür,et al.  HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking , 2019, INTERSPEECH.

[25]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[26]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[28]  Anoop Cherian,et al.  End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[30]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[31]  Ian Lane,et al.  BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer , 2019, INTERSPEECH.

[32]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[33]  Y-Lan Boureau,et al.  Overview of the sixth dialog system technology challenge: DSTC6 , 2019, Comput. Speech Lang..

[34]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Anoop Cherian,et al.  Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog , 2019, INTERSPEECH.

[36]  Jatin Ganhotra,et al.  A Large-Scale Corpus for Conversation Disentanglement , 2018, ACL.

[37]  Anoop Cherian,et al.  Audio Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Walter S. Lasecki,et al.  DSTC7 Task 1: Noetic End-to-End Response Selection , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[39]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[40]  Richard Socher,et al.  Global-Locally Self-Attentive Encoder for Dialogue State Tracking , 2018, ACL.

[41]  Rafael E. Banchs,et al.  The fifth dialog state tracking challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[42]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[43]  Bill Dolan,et al.  Grounded Response Generation Task at DSTC7 , 2019 .

[44]  Dilek Z. Hakkani-Tür,et al.  Multi-task Learning for Joint Language Understanding and Dialogue State Tracking , 2018, SIGDIAL Conference.

[45]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[46]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[47]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[48]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.