Fundamental Exploration of Evaluation Metrics for Persona Characteristics of Text Utterances

To maintain utterance quality of a persona-aware dialog system, inappropriate utterances for the persona should be thoroughly filtered. When evaluating the appropriateness of a large number of arbitrary utterances to be registered in the utterance database of a retrieval-based dialog system, evaluation metrics that require a reference (or a “correct” utterance) for each evaluation target cannot be used. In addition, practical utterance filtering requires the ability to select utterances based on the intensity of persona characteristics. Therefore, we are developing metrics that can be used to capture the intensity of persona characteristics and can be computed without references tailored to the evaluation targets. To this end, we explore existing metrics and propose two new metrics: persona speaker probability and persona term salience. Experimental results show that our proposed metrics show weak to moderate correlations between scores of persona characteristics based on human judgments and outperform other metrics overall in filtering inappropriate utterances for particular personas.

[1]  Marilyn A. Walker,et al.  PERSONAGE: Personality Generation for Dialogue , 2007, ACL.

[2]  Haoyu Song,et al.  Exploiting Persona Information for Diverse Generation of Conversational Responses , 2019, IJCAI.

[3]  Ryuichiro Higashinaka,et al.  Automatic conversion of sentence-end expressions for utterance characterization of dialogue systems , 2015, PACLIC.

[4]  Pascale Fung,et al.  Personalizing Dialogue Agents via Meta-Learning , 2019, ACL.

[5]  Yasuharu Den,et al.  A Proper Approach to Japanese Morphological Analysis: Dictionary, Model, and Evaluation , 2008, LREC.

[6]  Liang Pang,et al.  PEDNet: A Persona Enhanced Dual Alternating Learning Network for Conversational Response Generation , 2020, COLING.

[7]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[8]  Hung-Yi Lee,et al.  Personalized Dialogue Response Generation Learned from Monologues , 2019, INTERSPEECH.

[9]  Minlie Huang,et al.  A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data , 2019, AAAI.

[10]  J. Weston,et al.  Recipes for Safety in Open-domain Chatbots , 2020, ArXiv.

[11]  Ryuichiro Higashinaka,et al.  Role play-based question-answering by real users for building chatbots with consistent personalities , 2018, SIGDIAL Conference.

[12]  Ryuichiro Higashinaka,et al.  Towards an Entertaining Natural Language Generation System: Linguistic Peculiarities of Japanese Fictional Characters , 2016, SIGDIAL Conference.

[13]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[14]  Mengyuan Li,et al.  Guiding Variational Response Generator to Exploit Persona , 2020, ACL.

[15]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[16]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[17]  Oluwatobi Olabiyi,et al.  An Adversarial Learning Framework For A Persona-Based Multi-Turn Dialogue Model , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Nan Jiang,et al.  LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics , 2018, NAACL.

[22]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[23]  Qian Liu,et al.  You Impress Me: Dialogue Generation via Mutual Persona Perception , 2020, ACL.

[24]  Mohit Iyyer,et al.  Reformulating Unsupervised Style Transfer as Paraphrase Generation , 2020, EMNLP.