Real-Time Human-Robot Communication for Manipulation Tasks in Partially Observed Environments

In human teams, visual and auditory cues are often used to communicate information about the task and/or environment that may not otherwise be directly observable. Analogously, robots that primarily rely on visual sensors cannot directly observe some attributes of objects that may be necessary for reference resolution or task execution. The experiments in this paper address natural language interaction in human-robot teams for tasks where multi-modal (e.g. visual, auditory, haptic, etc) observations are necessary for robust execution. We present a probabilistic model, verified through physical experiments, that allows robots to acquire knowledge about the latent aspects of the workspace through language and physical interaction in an efficient manner. The model’s effectiveness is demonstrated on a mobile and a stationary manipulator in real-world scenarios by following instructions under partial knowledge of object states in the environment.

[1]  Nicholas Roy,et al.  Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms , 2018, Int. J. Robotics Res..

[2]  Jayant Krishnamurthy,et al.  Toward Interactive Grounded Language Acqusition , 2013, Robotics: Science and Systems.

[3]  Connor Schenck,et al.  Grounding semantic categories in behavioral interactions: Experiments with 100 objects , 2014, Robotics Auton. Syst..

[4]  Jean Oh,et al.  Inferring Maps and Behaviors from Natural Language Instructions , 2015, ISER.

[5]  Nicholas Roy,et al.  Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context , 2017, IJCAI.

[6]  Trevor Darrell,et al.  Corrigendum to "Robotic learning of haptic adjectives through physical interaction" [Robot. Auton. Syst. 63 (P3) (2015) 279-292] , 2016, Robotics Auton. Syst..

[7]  Jivko Sinapov,et al.  From Acoustic Object Recognition to Object Categorization by a Humanoid Robot , 2009 .

[8]  Danica Kragic,et al.  Interactive object classification using sensorimotor contingencies , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  James F. Allen,et al.  SALL-E: Situated Agent for Language Learning , 2013, AAAI.

[10]  Stephen G Pauker,et al.  Probability Distributions , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[11]  David Whitney,et al.  Interpreting multimodal referring expressions in real time , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Luke S. Zettlemoyer,et al.  Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions , 2014, AAAI.

[13]  Stefanie Tellex,et al.  A natural language planner interface for mobile manipulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Peter Stone,et al.  Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy" , 2016, IJCAI.

[15]  Martial Hebert,et al.  Integrated Intelligence for Human-Robot Teams , 2016, ISER.

[16]  Manuela M. Veloso,et al.  Learning environmental knowledge from task-based human-robot dialog , 2013, 2013 IEEE International Conference on Robotics and Automation.

[17]  Matthew R. Walter,et al.  A framework for learning semantic maps from grounded natural language descriptions , 2014, Int. J. Robotics Res..

[18]  Jacob Arkin,et al.  Experiments in Proactive Symbol Grounding for Efficient Physically Situated Human-Robot Dialogue , 2018 .

[19]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[20]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Trevor Darrell,et al.  Robotic learning of haptic adjectives through physical interaction , 2015, Robotics Auton. Syst..