対話の低速化とArtificial Subtle Expressionによる発話衝突の抑制

We argue that task-oriented spoken dialogue systems or communication robots do not need to quickly respond verbally as long as they quickly respond non-verbally by showing their internal states by using an artificial subtle expression. This paper describes an experiment whose results support this point. In this experiment, 48 participants engaged in reservation tasks with a spoken dialogue system coupled with an interface robot using a blinking light expression. The blinking light expression is designed as an artificial subtle expression to intuitively notify a user about a robot's internal states (such as processing) for the sake of reducing speech collisions as consequences of turn-taking failures due to end-of-turn misdetection. Speech collisions harm smooth speech communication and degrade system usability. Two experimental factors were setup: the blinking light factor (with or without a blinking light) and the reply speed factor (moderate or slow reply speed), resulting in four experimental conditions. The results suggest that both the slow reply speed and the blinking light expression can reduce speech collisions, and improve a user's impression. Meanwhile, contrary to expectation, no degradation of evaluation due to the slow reply speed was found.

[1]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[2]  Christoph Bartneck,et al.  Subtle emotional expressions of synthetic characters , 2005, Int. J. Hum. Comput. Stud..

[3]  Yasuo Horiuchi,et al.  Prediction of Turn-Taking from Prosody in Spontaneous Dialogue , 2006 .

[4]  Tetsuya Ogata,et al.  Robot Audition based on Multiple-Input Independent Component Analysis for Recognizing Barge-In Speech under Reverberation , 2011 .

[5]  Takayuki Kanda,et al.  How quickly should communication robots respond? , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Mikio Nakano,et al.  A Concept-Centric Framework for Building Natural Language Interfaces , 2008 .

[7]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[8]  Maxine Eskénazi,et al.  Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System , 2008, SIGDIAL Workshop.

[9]  Y. Rim Decisions involving risk in dyads. , 1967, Acta psychologica.

[10]  R. Hayashi,et al.  Simultaneous talk—from the perspective of floor management of English and Japanese speakers , 1988 .

[11]  Johan Boye,et al.  Real-time Handling of Fragmented Utterances , 2001 .

[12]  A. Kendon Do Gestures Communicate? A Review , 1994 .

[13]  Masanobu Abe,et al.  A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction , 2001, IEEE Trans. Speech Audio Process..

[14]  Seiichi Nakagawa,et al.  Response Timing Detection Using Prosodic and Linguistic Information for Human-friendly Spoken Dialog Systems (論文特集:人間と共生する情報システム) , 2005 .

[15]  Mikio Nakano,et al.  Effects of system barge-in responses on user impressions , 1999, EUROSPEECH.

[16]  W. Rogers,et al.  THE CONTRIBUTION OF KINESIC ILLUSTRATORS TOWARD THE COMPREHENSION OF VERBAL BEHAVIOR WITHIN UTTERANCES , 1978 .

[17]  David Schlangen,et al.  Push-to-talk ain't always bad! Comparing Different Interactivity Settings in Task-oriented Dialogue , 2007 .

[18]  Takanori Komatsu,et al.  Artificial Subtle Expressions: Proposing intuitive notification methodology of agents' internal states , 2010 .

[19]  Mitsuru Ishizuka,et al.  Using human physiology to evaluate subtle expressivity of a virtual quizmaster in a mathematical game , 2005, Int. J. Hum. Comput. Stud..

[20]  Kenji Araki,et al.  Analysis of User Reactions to Turn-Taking Failures in Spoken Dialogue Systems , 2007, SIGdial.

[21]  Seiji Yamada,et al.  Designing simple and effective expression of robot's primitive minds to a human , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.