论文信息 - Edina: Building an Open Domain Socialbot with Self-dialogues

Edina: Building an Open Domain Socialbot with Self-dialogues

We present Edina, the University of Edinburgh's social bot for the Amazon Alexa Prize competition. Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. Each match of a soft rule against a user utterance is associated with a confidence score which we show is strongly indicative of reply quality, allowing this component to self-censor and be effectively integrated with other components. Edina's full architecture features a rule-based system backing off to a matching score, backing off to a generative neural network. Our hybrid data-driven methodology thus addresses both coverage limitations of a strictly rule-based approach and the lack of guarantees of a strictly machine-learning approach.

[1] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Joelle Pineau,et al. A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[3] Steve Renals,et al. Multiplicative LSTM for sequence modelling , 2016, ICLR.

[4] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[5] Marilyn A. Walker,et al. Data-Driven Dialogue Systems for Social Agents , 2017, IWSDS.

[6] Pablo N. Mendes,et al. Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[7] Christopher Potts,et al. The Life and Death of Discourse Entities: Identifying Singleton Mentions , 2013, NAACL.

[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[9] Jörg Tiedemann,et al. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.