Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback

We describe our Alexa prize system (called ‘Alana’) which consists of an ensemble of bots, combining rule-based and machine learning systems, and using a contextual ranking mechanism to choose system responses. This paper reports on the version of the system developed and evaluated in the semi-finals of the competition (i.e. up to 15 August 2017), but not on subsequent enhancements. The ranker for this system was trained on real user feedback received during the competition, where we address the problem of how to train on the noisy and sparse feedback obtained during the competition. In order to avoid initial problems of inappropriate and boring utterances coming from big datasets such as Reddit and Twitter, we later focussed on ‘clean’ data sources such as news and facts. We report on experiments with different ranking functions and versions of our NewsBot. We find that a multiturn news strategy is beneficial, and that a ranker trained on the ratings feedback from users is also effective. Our system continuously improved using the data gathered over the course over the competition (1 July – 15 August) . Our final user score (averaged user rating over the whole semi-finals period) was 3.12, and we achieved 3.3 for the averaged user rating over the last week of the semi-finals (8-15 August 2017). We were also able to achieve long dialogues (average 10.7 turns) during the competition period. In subsequent weeks, after the end of the semi-final competition, we have achieved our highest scores of 3.52 (daily average, 18th October), 3.45 (weekly average on 23 and 24 October), and average dialogue lengths of 14.6 turns (1 October), and median dialogue length of 2.25 minutes (average for 7 days on 10th October).

[1]  Joelle Pineau,et al.  Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[2]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[3]  Oliver Lemon,et al.  Hybrid chat and task dialogue for more engaging HRI using reinforcement learning , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[4]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[5]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[6]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[7]  Cristian Danescu-Niculescu-Mizil,et al.  Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.

[8]  Xiang Li,et al.  Two are Better than One: An Ensemble of Retrieval- and Generation-Based Dialog Systems , 2016, ArXiv.

[9]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[10]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[11]  Pierre Lison,et al.  Automatic turn segmentation for Movie & TV subtitles , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[12]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[13]  Oliver Lemon,et al.  Combining Chat and Task-Based Multimodal Dialogue for More Engaging HRI: A Scalable Method Using Reinforcement Learning , 2017, HRI.

[14]  Zhou Yu,et al.  Strategy and Policy Learning for Non-Task-Oriented Conversational Systems , 2016, SIGDIAL Conference.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Alexander I. Rudnicky,et al.  Learning Conversational Systems that Interleave Task and Non-Task Content , 2017, IJCAI.

[17]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.