A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction

Picking up objects requested by a human user is a common task in human-robot interaction. When multiple objects match the user's verbal description, the robot needs to clarify which object the user is referring to before executing the action. Previous research has focused on perceiving user's multimodal behaviour to complement verbal commands or minimising the number of follow up questions to reduce task time. In this paper, we propose a system for reference disambiguation based on visualisation and compare three methods to disambiguate natural language instructions. In a controlled experiment with a YuMi robot, we investigated realtime augmentations of the workspace in three conditions - head-mounted display, projector, and a monitor as the baseline - using objective measures such as time and accuracy, and subjective measures like engagement, immersion, and display interference. Significant differences were found in accuracy and engagement between the conditions, but no differences were found in task time. Despite the higher error rates in the head-mounted display condition, participants found that modality more engaging than the other two, but overall showed preference for the projector condition over the monitor and head-mounted display conditions.

[1]  David Schlangen,et al.  A simple generative model of incremental reference resolution for situated dialogue , 2017, Comput. Speech Lang..

[2]  Nicholas Roy,et al.  Grounding Abstract Spatial Concepts for Language Interaction with Robots , 2017, IJCAI.

[3]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[4]  Vikram Kapila,et al.  Mobile Mixed-Reality Interfaces That Enhance Human–Robot Interaction in Shared Spaces , 2017, Front. Robot. AI.

[5]  Gabriel Skantze Jindigo : a Java-based Framework for Incremental Dialogue Systems , 2010 .

[6]  David Whitney,et al.  Reducing errors in object-fetching interactions through social feedback , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Terry Winograd,et al.  Understanding natural language , 1974 .

[8]  Siddhartha S. Srinivasa,et al.  Spatial references and perspective in natural language instructions for collaborative manipulation , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[9]  M. Lombard,et al.  Measuring Presence: The Temple Presence Inventory , 2009 .

[10]  Michael J. Singer,et al.  Measuring Presence in Virtual Environments: A Presence Questionnaire , 1998, Presence.

[11]  Thomas B. Moeslund,et al.  Projecting robot intentions into human environments , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[12]  Sachin Chitta,et al.  MoveIt! [ROS Topics] , 2012, IEEE Robotics Autom. Mag..

[13]  Dan Klein,et al.  Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Jill Fain Lehman,et al.  Augmented reality dialog interface for multimodal teleoperation , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[15]  Changsong Liu,et al.  Collaborative Effort towards Common Ground in Situated Human-Robot Dialogue , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[16]  Ravi Teja Chadalavada,et al.  That's on my mind! robot to human intention communication through on-board projection on shared floor space , 2015, 2015 European Conference on Mobile Robots (ECMR).

[17]  Patrick Gebhard,et al.  Exploring a Model of Gaze for Grounding in Multimodal HRI , 2014, ICMI.

[18]  Emanuele Ruffaldi,et al.  Third Point of View Augmented Reality for Robot Intentions Visualization , 2016, AVR.

[19]  David Whitney,et al.  Communicating Robot Arm Motion Intent Through Mixed Reality Head-mounted Displays , 2017, ISRR.

[20]  Brian Scassellati,et al.  Modeling communicative behaviors for object references in human-robot interaction , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Nicholas Roy,et al.  Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context , 2017, IJCAI.

[22]  Julian Hough,et al.  It's Not What You Do, It's How You Do It: Grounding Uncertainty for a Simple Robot , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[23]  Ipke Wachsmuth,et al.  Spatial References with Gaze and Pointing in Shared Space of Humans and Robots , 2014, Spatial Cognition.

[24]  Allison Sauppé,et al.  Robot Deictics: How Gesture and Context Shape Referential Communication , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Luc Steels,et al.  Language Grounding in Robots , 2012, Springer US.