Conversational Scaffolding: An Analogy-based Approach to Response Prioritization in Open-domain Dialogs

We present Conversational Scaffolding, a response-prioritization technique that capitalizes on the structural properties of existing linguistic embedding spaces. Vector offset operations within the embedding space are used to identify an ‘ideal’ response for each set of inputs. Candidate utterances are scored based on their cosine distance from this ideal response, and the top-scoring candidate is selected as conversational output. We apply our method in an open-domain dialog setting and show that the most effective analogy-based strategy outperforms both an Approximate Nearest-Neighbor approach and a naive nearest neighbor baseline. We also demonstrate the method’s ability to retrieve relevant dialog responses from a repository containing 19,665 random sentences. As an additional contribution we present the Chit-Chat dataset, a high-quality conversational dataset containing 483,112 lines of friendly, respectful chat exchanges between university students.

[1]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[2]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[3]  Xiaojun Wan,et al.  Improving Word Embeddings for Antonym Detection Using Thesauri and SentiWordNet , 2018, NLPCC.

[4]  Satoshi Nakamura,et al.  Neural Network Approaches to Dialog Response Retrieval and Generation , 2016, IEICE Trans. Inf. Syst..

[5]  David R. Traum,et al.  Surface Text based Dialogue Models for Virtual Humans , 2013, SIGDIAL Conference.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[8]  Joelle Pineau,et al.  Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus , 2017, Dialogue Discourse.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Sandeep Kumar,et al.  Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator , 2018, COLING.

[11]  Guillaume Dubuisson Duplessis,et al.  Utterance Retrieval Based on Recurrent Surface Text Patterns , 2017, ECIR.

[12]  Haizhou Li,et al.  IRIS: a Chat-oriented Dialogue System based on the Vector Space Model , 2012, ACL.

[13]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[14]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[15]  D. Paulhus,et al.  Trolls just want to have fun , 2014 .

[16]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[17]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[18]  David Wingate,et al.  Harvesting Common-sense Navigational Knowledge for Robotics from Uncurated Text Corpora , 2017, CoRL.

[19]  Gerard de Melo,et al.  Exploring Semantic Properties of Sentence Embeddings , 2018, ACL.

[20]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[21]  Joy A. Frechtling,et al.  Characteristics of the Discussion in Online and Face-to-Face Focus Groups , 2002 .

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[24]  Gerasimos Spanakis,et al.  A Retrieval-Based Dialogue System Utilizing Utterance and Context Embeddings , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[25]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[26]  Bruna Thalenberg Distinguishing Antonyms from Synonyms in Vector Space Models of Semantics , 2016 .

[27]  Eric Fosler-Lussier,et al.  Adjusting Word Embeddings with Semantic Intensity Orders , 2016, Rep4NLP@ACL.

[28]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Guillaume Dubuisson Duplessis,et al.  Comparing System-response Retrieval Models for Open-domain and Casual Conversational Agent , 2016 .

[31]  A. Acquisti,et al.  The More Social Cues, The Less Trolling? An Empirical Study of Online Commenting Behavior , 2013 .

[32]  Hadeel Al-Zubaide,et al.  OntBot: Ontology based chatbot , 2011, International Symposium on Innovations in Information and Communications Technology.

[33]  David Wingate,et al.  What Can You Do with a Rock? Affordance Extraction via Word Embeddings , 2017, IJCAI.

[34]  Zheng Lin,et al.  Learning Sentiment-Specific Word Embedding via Global Sentiment Representation , 2018, AAAI.

[35]  Satoshi Matsuoka,et al.  Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. , 2016, NAACL.

[36]  Andrew N. Carr,et al.  BYU-EVE: Mixed Initiative Dialog via Structured Knowledge Graph Traversal and Conversational Scaffolding , 2018 .

[37]  Lee Rainie,et al.  The future of free speech, trolls, anonymity and fake news online , 2017 .

[38]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[39]  Satoshi Matsuoka,et al.  Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.