Recognition of Affective Communicative Intent in Robot-Directed Speech

Human speech provides a natural and intuitive interface for both communicating with humanoid robots as well as for teaching them. In general, the acoustic pattern of speech contains three kinds of information: who the speaker is, what the speaker said, and how the speaker said it. This paper focuses on the question of recognizing affective communicative intent in robot-directed speech without looking into the linguistic content. We present an approach for recognizing four distinct prosodic patterns that communicate praise, prohibition, attention, and comfort to preverbal infants. These communicative intents are well matched to teaching a robot since praise, prohibition, and directing the robot's attention to relevant aspects of a task, could be used by a human instructor to intuitively facilitate the robot's learning process. We integrate this perceptual ability into our robot's “emotion” system, thereby allowing a human to directly manipulate the robot's affective state. This has a powerful organizing influence on the robot's behavior, and will ultimately be used to socially communicate affective reinforcement. Communicative efficacy has been tested with people very familiar with the robot as well as with naïve subjects.

[1]  Cynthia Breazeal,et al.  Robot in Society: Friend or Appliance? , 1999 .

[2]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[3]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[4]  M. Bullowa Before Speech: The Beginning of Interpersonal Communication , 1979 .

[5]  Alex Pentland,et al.  Automatic spoken affect classification and analysis , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[6]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[7]  A. Damasio Descartes’ Error. Emotion, Reason and the Human Brain. New York (Grosset/Putnam) 1994. , 1994 .

[8]  P. Maes,et al.  Old tricks, new dogs: ethology and interactive creatures , 1997 .

[9]  Brian Scassellati,et al.  IEEE Intelligent Systems , 2018, Computer.

[10]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[11]  P. Ekman Are there basic emotions? , 1992, Psychological review.

[12]  C. Snow Mothers' Speech to Children Learning Language. , 1972 .

[13]  Kerstin Dautenhahn,et al.  Challenges in Building Robots That Imitate People , 2002 .

[14]  Ryohei Nakatsu,et al.  Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[15]  Brian Scassellati,et al.  How to build robots that make friends and influence people , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[16]  A. Fernald Four-Month-Old Infants Prefer to Listen to Motherese" , 1985 .

[17]  D. Stern,et al.  Intonation contours as signals in maternal speech to prelinguistic infants. , 1982 .

[18]  P. Kuhl,et al.  Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. , 1988 .

[19]  Irenäus Eibl-Eibesfeldt,et al.  Liebe und Hass - Zur Naturgeschichte elementarer Verhaltensweisen , 1970 .

[20]  C. Breazeal,et al.  SCHMOOZING WITH ROBOTS : EXPLORING THE BOUNDARY OF THE ORIGINAL WIRELESS NETWORK , 1999 .

[21]  Bruce Blumberg,et al.  Motivation driven learning for interactive synthetic characters , 2000, AGENTS '00.

[22]  Janet E. Cahn Generating expression in synthesized speech , 1989 .

[23]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[24]  Cynthia Breazeal,et al.  A Motivational System for Regulating Human-Robot Interaction , 1998, AAAI/IAAI.

[25]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[26]  Malcolm Slaney,et al.  Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).