Intelligent Conversational Android ERICA Applied to Attentive Listening and Job Interview

Following the success of spoken dialogue systems (SDS) in smartphone assistants and smart speakers, a number of communicative robots are developed and commercialized. Compared with the conventional SDSs designed as a human-machine interface, interaction with robots is expected to be in a closer manner to talking to a human because of the anthropomorphism and physical presence. The goal or task of dialogue may not be information retrieval, but the conversation itself. In order to realize human-level “long and deep” conversation, we have developed an intelligent conversational android ERICA. We set up several social interaction tasks for ERICA, including attentive listening, job interview, and speed dating. To allow for spontaneous, incremental multiple utterances, a robust turn-taking model is implemented based on TRP (transition-relevance place) prediction, and a variety of backchannels are generated based on time frame-wise prediction instead of IPU-based prediction. We have realized an open-domain attentive listening system with partial repeats and elaborating questions on focus words as well as assessment responses. It has been evaluated with 40 senior people, engaged in conversation of 5-7 minutes without a conversation breakdown. It was also compared against the WOZ setting. We have also realized a job interview system with a set of base questions followed by dynamic generation of elaborating questions. It has also been evaluated with student subjects, showing promising results.

[1]  Ryo Ishii,et al.  Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks , 2017, INTERSPEECH.

[2]  Tatsuya Kawahara,et al.  Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios , 2018, ICMI.

[3]  Catherine Pelachaud,et al.  Engagement in Human-Agent Interaction: An Overview , 2020, Frontiers in Robotics and AI.

[4]  Abhinav Dhall,et al.  EmotiW 2019: Automatic Emotion, Engagement and Cohesion Prediction Tasks , 2019, ICMI.

[5]  Tomoaki Nakamura,et al.  Small Talk Improves User Impressions of Interview Dialogue Systems , 2016, SIGDIAL Conference.

[6]  Tatsuya Kawahara,et al.  Prediction and Generation of Backchannel Form for Attentive Listening Systems , 2016, INTERSPEECH.

[7]  Gabriel Skantze,et al.  Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks , 2017, SIGDIAL Conference.

[9]  Tatsuya Kawahara,et al.  Talking with ERICA, an autonomous android , 2016, SIGDIAL Conference.

[10]  Norihiro Hagita,et al.  Hearing support system using environment sensor network , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Eric Horvitz,et al.  Models for Multiparty Engagement in Open-World Dialog , 2009, SIGDIAL Conference.

[12]  Tatsuya Kawahara,et al.  Attentive listening system with backchanneling, response generation and flexible turn-taking , 2017, SIGDIAL Conference.

[13]  Kallirroi Georgila,et al.  SimSensei kiosk: a virtual human interviewer for healthcare decision support , 2014, AAMAS.

[14]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[15]  Hiroshi Ishiguro,et al.  Evaluation of formant-based lip motion generation in tele-operated humanoid robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Nigel G. Ward,et al.  Prosodic features which cue back-channel responses in English and Japanese , 2000 .

[17]  Takashi Minato,et al.  Online speech-driven head motion generating system and evaluation on a tele-operated robot , 2015, 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[18]  Tatsuya Kawahara,et al.  ERICA: The ERATO Intelligent Conversational Android , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[19]  Tatsuya Kawahara,et al.  Turn-Taking Prediction Based on Detection of Transition Relevance Place , 2019, INTERSPEECH.

[20]  Daniel Jurafsky,et al.  It’s Not You, it’s Me: Detecting Flirting and its Misperception in Speed-Dates , 2009, EMNLP.

[21]  Tetsunori Kobayashi,et al.  Conversation robot participating in and activating a group communication , 2009, INTERSPEECH.

[22]  Tatsuya Kawahara,et al.  Job Interviewer Android with Elaborate Follow-up Question Generation , 2020, ICMI.

[23]  Maxine Eskénazi,et al.  A Finite-State Turn-Taking Model for Spoken Dialog Systems , 2009, NAACL.

[24]  Tatsuya Kawahara,et al.  Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue , 2018 .

[25]  Katsuya Takanashi,et al.  An Attentive Listening System with Android ERICA: Comparison of Autonomous and WOZ Interactions , 2020, SIGDIAL.

[26]  Dirk Heylen,et al.  A rule-based backchannel prediction model using pitch and pause information , 2010, INTERSPEECH.

[27]  Tatsuya Kawahara Spoken Dialogue System for a Human-like Conversational Robot ERICA , 2018, IWSDS.