Web-based environment for user generation of spoken dialog for virtual assistants

In this paper, a web-based spoken dialog generation environment which enables users to edit dialogs with a video virtual assistant is developed and to also select the 3D motions and tone of voice for the assistant. In our proposed system, “anyone” can “easily” post/edit contents of the dialog for the dialog system. The dialog type corresponding to the system is limited to the question-and-answer type dialog, in order to avoid editing conflicts caused by editing by multiple users. The spoken dialog sharing service and FST generator generates spoken dialog content for the MMDAgent spoken dialog system toolkit, which includes a speech recognizer, a dialog control unit, a speech synthesizer, and a virtual agent. For dialog content creation, question-and-answer dialogs posted by users and FST templates are used. The proposed system was operated for more than a year in a student lounge at the Nagoya Institute of Technology, where users added more than 500 dialogs during the experiment. Images were also registered to 65% of the postings. The most posted category is related to “animation, video games, manga.”The system was subjected to open examination by tourist information staff who had no prior experience with spoken dialog systems. Based on their impressions of tourist use of the dialog system, they shortened the length of some of the system’s responses and added pauses to the longer responses to make them easier to understand.

[1]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[2]  Daisuke Yamamoto,et al.  A Voice Dialog Editor Based on Finite State Transducer Using Composite State for Tablet Devices , 2016 .

[4]  Frédéric Béchet,et al.  On the use of finite state transducers for semantic interpretation , 2006, Speech Commun..

[5]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[6]  Masahiro Araki,et al.  An automatic dialogue system generator from the internet information contents , 2001, INTERSPEECH.

[7]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[8]  Kiyohiro Shikano,et al.  Operating a public spoken guidance system in real environment , 2005, INTERSPEECH.

[9]  Satoshi Nakamura,et al.  Dialog management using weighted finite-state transducers , 2008, INTERSPEECH.

[10]  Daisuke Yamamoto,et al.  Development of a dialogue scenario editor on a web browser for a spoken dialogue system , 2014, HAI.

[11]  Kiyohiro Shikano,et al.  Development and Operation of Speech-Oriented Information Guidance Systems, Kita-chan and Kita-robo , 2011 .

[12]  Carolyn Penstein Rosé,et al.  The Architecture of Why2-Atlas: A Coach for Qualitative Physics Essay Writing , 2002, Intelligent Tutoring Systems.

[13]  Keiichi Tokuda,et al.  Voice interaction system with 3D-CG virtual agent for stand-alone smartphones , 2014, HAI.

[14]  Tsuneo Nitta,et al.  XISL: a language for describing multimodal interaction scenarios , 2003, ICMI '03.

[15]  Diane J. Litman,et al.  ITSPOKE: An Intelligent Tutoring Spoken Dialogue System , 2004, NAACL.

[16]  Keiichi Tokuda,et al.  Mmdagent—A fully open-source toolkit for voice interaction systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Frédéric Béchet,et al.  Spoken Language Understanding Strategies on the France Telecom 3000 Voice Agency Corpus , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Tsuneo Nitta,et al.  Interaction Builder: A Rapid Prototyping Tool for Developing Web-Based MMI Applications , 2005, IEICE Trans. Inf. Syst..

[19]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[20]  Steve McLaughlin,et al.  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP' 07) , 2007 .