Physical symbol grounding and instance learning through demonstration and eye tracking

It is natural for humans to work with abstract plans which are often an intuitive and concise way to represent a task. However, high level task descriptions contain symbols and concepts which need to be grounded within the environment if the plan is to be executed by an autonomous robot. The problem of learning the mapping between abstract plan symbols and their physical instances in the environment is known as the problem of physical symbol grounding. In this paper, we propose a framework for Grounding and Learning Instances through Demonstration and Eye tracking (GLIDE). We associate traces of task demonstration to a sequence of fixations which we call fixation programs and exploit their properties in order to perform physical symbol grounding. We formulate the problem as a probabilistic generative model and present an algorithm for computationally feasible inference over the proposed model. A key aspect of our work is that we estimate fixation locations within the environment which enables the appearance of symbol instances to be learnt. Instance learning is a crucial ability when the robot does not have any knowledge about the model or the appearance of the symbols referred to in the plan instructions. We have conducted human experiments and demonstrate that GLIDE successfully grounds plan symbols and learns the appearance of their instances, thus enabling robots to autonomously execute tasks in initially unknown environments.

[1]  Albert S. Huang,et al.  Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands , 2017, ArXiv.

[2]  Paul Vogt,et al.  The physical symbol grounding problem , 2002, Cognitive Systems Research.

[3]  Thies Pfeiffer,et al.  EyeSee3D 2.0: model-based real-time analysis of mobile eye-tracking in static and dynamic three-dimensional scenes , 2016, ETRA.

[4]  Johann Schrammel,et al.  3D attention: measurement of visual saliency using eye tracking glasses , 2013, CHI Extended Abstracts.

[5]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[6]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[7]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[9]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[10]  Mariarosaria Taddeo,et al.  Solving the symbol grounding problem: a critical review of fifteen years of research , 2005, J. Exp. Theor. Artif. Intell..

[11]  Stefanie Tellex,et al.  Learning perceptually grounded word meanings from unaligned parallel data , 2012, Machine Learning.

[12]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[13]  John R. Searle,et al.  Minds, brains, and programs , 1980, Behavioral and Brain Sciences.

[14]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[15]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[16]  Bilge Mutlu,et al.  Anticipatory robot control for efficient human-robot collaboration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17]  Wendy E. Mackay,et al.  CHI '13 Extended Abstracts on Human Factors in Computing Systems , 2013, CHI 2013.

[18]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[19]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[20]  Luke S. Zettlemoyer,et al.  Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions , 2014, AAAI.

[21]  Dana H. Ballard,et al.  Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[22]  Albert S. Huang,et al.  Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Language , 2013 .

[23]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[24]  R. Johansson,et al.  Eye–Hand Coordination in Object Manipulation , 2001, The Journal of Neuroscience.

[25]  Frank Keller,et al.  Training Object Class Detectors from Eye Tracking Data , 2014, ECCV.

[26]  Mary M Hayhoe,et al.  Visual memory and motor planning in a natural task. , 2003, Journal of vision.

[27]  David Whitney,et al.  Interpreting multimodal referring expressions in real time , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).