Bayesian human intention inference through multiple model filtering with gaze-based priors

An algorithm called gaze-based multiple model intention estimator (G-MMIE) is presented for early prediction of the goal location (intention) of human reaching actions. To capture the complexity of human arm reaching motion, a neural network (NN) is used to represent the arm motion dynamics. The trajectories of the arm motion for reaching tasks are modeled by using a dynamical system with contracting behavior towards the goal location. The contraction behavior ensures that the model trajectories will converge to the goal location. The NN training is subjected to contraction analysis constraints. In order to use the motion model learned from a few demonstrations with new scenarios and multiple objects, an interacting multiple model (IMM) framework is used. The multiple models are obtained by translating the origin of the contracting system to different known object locations. Each model corresponds to the reaching motion that ends at a certain object location. Since humans tend to look in the direction of the object they are reaching for, the prior probabilities of the models are calculated based on the human eye gaze. The posterior probabilities of the models are calculated through interacting model matched filtering carried out using extended Kalman filters (EKFs). The model or the object location with the highest posterior probability is chosen to be the estimate of the goal location. Experimental results suggest that the G-MMIE algorithm is able to adapt to arbitrary sequences of reaching motions and the gaze-based prior outperforms the uniform prior in terms of intention inference accuracy and average time of inference.

[1]  Tanja Schultz,et al.  Combined intention, activity, and motion recognition for a humanoid household robot , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Dmitry Berenson,et al.  A framework for unsupervised online human reaching motion recognition and early prediction , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[4]  Darius Burschka,et al.  Predicting human intention in visual observations of hand/object interactions , 2013, 2013 IEEE International Conference on Robotics and Automation.

[5]  Alexander Zelinsky,et al.  3-D facial pose and gaze point estimation using a robust real-time tracking paradigm , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Peter Willett,et al.  Systematic approach to IMM mixing for unequal dimension states , 2015, IEEE Transactions on Aerospace and Electronic Systems.

[8]  Dmitry Berenson,et al.  Predicting human reaching motion in collaborative tasks using Inverse Optimal Control and iterative re-planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[10]  Siddhartha S. Srinivasa,et al.  Learning the communication of intent prior to physical collaboration , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[11]  Jodie A. Baird,et al.  Discerning intentions in dynamic human action , 2001, Trends in Cognitive Sciences.

[12]  Matthias Vogelgesang,et al.  Multimodal integration of natural gaze behavior for intention recognition during object manipulation , 2009, ICMI-MLMI '09.

[13]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[16]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[17]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Thiagalingam Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation , 2001 .

[19]  Carlos Morato,et al.  Toward Safe Human Robot Collaboration by Using Multiple Kinects Based Real-Time Human Tracking , 2014, J. Comput. Inf. Sci. Eng..

[20]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[21]  Dana Kulic,et al.  Affective State Estimation for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[22]  Ashwin P. Dani,et al.  Learning Contracting Nonlinear Dynamics From Human Demonstration for Robot Motion Planning , 2015, HRI 2015.

[23]  V. Javier Traver,et al.  Making service robots human-safe , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[24]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[25]  Olaf Stursberg,et al.  Human arm motion modeling and long-term prediction for safe and efficient Human-Robot-Interaction , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  Jos Elfring,et al.  Learning intentions for improved human motion prediction , 2013, 2013 16th International Conference on Advanced Robotics (ICAR).

[27]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[28]  Gwen Littlewort,et al.  Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction. , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[29]  W. Buxton Human-Computer Interaction , 1988, Springer Berlin Heidelberg.

[30]  G. Gredebäck,et al.  Eye Movements During Action Observation , 2015, Perspectives on psychological science : a journal of the Association for Psychological Science.

[31]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[32]  Justin W. Hart,et al.  Gesture, Gaze, Touch, and Hesitation: Timing Cues for Collaborative Work , 2014 .

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Rajesh P. N. Rao,et al.  Gaze Following as Goal Inference: A Bayesian Model , 2011, CogSci.

[35]  C. Kleinke Gaze and eye contact: a research review. , 1986, Psychological bulletin.

[36]  Ashwin P. Dani,et al.  Human intention inference through interacting multiple model filtering , 2015, 2015 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[37]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[38]  Winfried Stefan Lohmiller,et al.  Contraction analysis of nonlinear systems , 1999 .

[39]  Ashwin P. Dani,et al.  Human intention inference and motion modeling using approximate E-M with online learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  A. ZelinskyResearch,et al.  The Essential Components of Human-Friendly Robot Systems , 1999 .

[41]  M. A. Simon,et al.  Understanding Human Action: Social Explanation and the Vision of Social Science. , 1983 .

[42]  R. Johansson,et al.  Action plans used in action observation , 2003, Nature.

[43]  Jean-Jacques E. Slotine,et al.  On Contraction Analysis for Non-linear Systems , 1998, Autom..

[44]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.