A Multi-modal Approach for Natural Human-Robot Interaction

We present a robot that is able to interact with people in a natural, multi-modal way by using both speech and gesture. The robot is able to track people, process speech and understand language. To track people and recognize gestures, the robot uses an RGB-D sensor (e.g., a Microsoft Kinect). To recognize speech, the robot uses a cloud-based service. To understand language, the robot uses a probabilistic graphical model to infer the meaning of a natural language query. We have evaluated our system in two domains. The first domain is a robot receptionist (roboceptionist); we show that the roboceptionist is able to interact successfully with people 77% of the time when people are primed with the capabilities of the robot compared to 57% when people are not primed with its capabilities. The second domain is a mobile service robot, which is able to interact with people via natural language.

[1]  Trevor Darrell,et al.  Co-Adaptation of audio-visual speech and gesture classifiers , 2006, ICMI '06.

[2]  R. Barber,et al.  Maggie: A Robotic Platform for Human-Robot Social Interaction , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[3]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[4]  C. Breazeal Sociable Machines: Expressive Social Ex-change Between Humans and Robots , 2000 .

[5]  Brett Browning,et al.  Dialogue patterns of an Arabic robot receptionist , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.

[7]  Stephanie Rosenthal,et al.  An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[8]  Takayuki Kanda,et al.  Footing in human-robot conversations: How robots might shape participant roles using gaze cues , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[9]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[10]  Antonio Camurri,et al.  Gesture-Based Communication in Human-Computer Interaction , 2003, Lecture Notes in Computer Science.

[11]  Regina Barzilay,et al.  Gesture in automatic discourse processing , 2008 .

[12]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[13]  B. Scassellati Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[14]  Stephanie Rosenthal,et al.  Designing robots for long-term social interaction , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.