Towards Sentiment-Aware Multi-Modal Dialogue Policy Learning

Creation of task-oriented dialog/virtual agent (VA) capable of managing complex domain-specific user queries pertaining to multiple intents is difficult since the agent must deal with several subtasks simultaneously. Most end-to-end dialogue systems, however, only provide user semantics as inputs from texts into the learning process and neglect other useful user behaviour and information from other modalities such as images. This stresses the benefit of incorporating multi-modal inputs for eliciting user preference in the task. Also, sentiment of the user plays a significant role in achieving maximum user/customer satisfaction during the conversation. Thus, it is also important to incorporate users’ sentiments during policy learning, especially when serving user’s composite goals. For the creation of multi-modal VA aided with sentiment for conversations encompassing multi-intents, this paper introduces a new dataset, named Vis-SentiVA: Visual and Sentiment aided VA created from open-accessed conversational dataset. We present a hierarchical reinforcement learning (HRL) typically options-based VA to learn policies for serving multi-intent dialogues. Multi-modal information (texts and images) extraction to identify user’s preference is incorporated in the learning framework. A combination of task-based and sentiment-based rewards is integrated in the hierarchical value functions for the VA to be user adaptive. Empirically, we show that all these aspects induced together in the learning framework play a vital role in acquiring higher dialogue task success and increased user contentment in the process of creating composite-natured VAs. This is the first effort in integrating sentiment-aware rewards in the multi-modal HRL framework. The paper highlights that it is indeed essential to include other modes of information extraction such as images and behavioural cues of the user such as sentiment to secure greater user contentment. This also helps in improving success of composite-natured VAs serving task-oriented dialogues.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Jaime C. Acosta,et al.  Achieving rapport with turn-by-turn, user-responsive emotional coloring , 2011, Speech Commun..

[3]  Erik Cambria,et al.  SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis , 2020, CIKM.

[4]  Sriparna Saha,et al.  Emotion Aided Dialogue Act Classification for Task-Independent Conversations in a Multi-modal Framework , 2020, Cognitive Computation.

[5]  Erik Cambria,et al.  Fuzzy commonsense reasoning for multimodal sentiment analysis , 2019, Pattern Recognit. Lett..

[6]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[7]  Erik Cambria,et al.  Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..

[8]  Jim Davidson,et al.  Intention Awareness in the Nutshell , 2004 .

[9]  Erik Cambria,et al.  Intention awareness: improving upon situation awareness in human-centric environments , 2013, Human-centric Computing and Information Sciences.

[10]  Sriparna Saha,et al.  Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning , 2020, PloS one.

[11]  Heriberto Cuayáhuitl,et al.  SimpleDS: A Simple Deep Reinforcement Learning Dialogue System , 2016, IWSDS.

[12]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[13]  Erik Cambria,et al.  Multitask Representation Learning for Multimodal Estimation of Depression Level , 2019, IEEE Intelligent Systems.

[14]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[15]  Erik Cambria,et al.  A survey on empathetic dialogue systems , 2020, Inf. Fusion.

[16]  Wolfgang Minker,et al.  Emotion recognition and adaptation in spoken dialogue systems , 2010, Int. J. Speech Technol..

[17]  Qinmin Hu,et al.  Position-aware hierarchical transfer model for aspect-level sentiment classification , 2020, Inf. Sci..

[18]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19]  Jaime Acosta,et al.  Using Emotion to Gain Rapport in a Spoken Dialog System , 2009, NAACL.