Gaze and motion information fusion for human intention inference

An algorithm, named gaze-based multiple model intention estimator (G-MMIE), is presented for early prediction of the goal location (intention) of human reaching actions. The trajectories of the arm motion for reaching tasks are modeled by using an autonomous dynamical system with contracting behavior towards the goal location. To represent the dynamics of human arm reaching motion, a neural network (NN) is used. The parameters of the NN are learned under constraints derived based on contraction analysis. The constraints ensure that the trajectories of the dynamical system converge to a single equilibrium point. In order to use the motion model learned from a few demonstrations in new scenarios with multiple candidate goal locations, an interacting multiple-model (IMM) framework is used. For a given reaching motion, multiple models are obtained by translating the equilibrium point of the contracting system to different known candidate locations. Hence, each model corresponds to the reaching motion that ends at the respective candidate location. Further, since humans tend to look toward the location they are reaching for, prior probabilities of the goal locations are calculated based on the information about the human’s gaze. The posterior probabilities of the models are calculated through interacting model matched filtering. The candidate location with the highest posterior probability is chosen to be the estimate of the true goal location. Detailed quantitative evaluations of the G-MMIE algorithm on two different datasets involving 15 subjects, and comparisons with state-of-the-art intention inference algorithms are presented.

[1]  Julie A. Shah,et al.  Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[3]  M. A. Simon,et al.  Understanding Human Action: Social Explanation and the Vision of Social Science. , 1983 .

[4]  R. Johansson,et al.  Action plans used in action observation , 2003, Nature.

[5]  Jean-Jacques E. Slotine,et al.  On Contraction Analysis for Non-linear Systems , 1998, Autom..

[6]  Brian D. Ziebart,et al.  Intent Prediction and Trajectory Forecasting via Predictive Inverse Linear-Quadratic Regulation , 2015, AAAI.

[7]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[8]  Justin W. Hart,et al.  Gesture, Gaze, Touch, and Hesitation: Timing Cues for Collaborative Work , 2014 .

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Shuzhi Sam Ge,et al.  Human–Robot Collaboration Based on Motion Intention Estimation , 2014, IEEE/ASME Transactions on Mechatronics.

[11]  Ashwin P. Dani,et al.  Learning Contracting Nonlinear Dynamics From Human Demonstration for Robot Motion Planning , 2015, HRI 2015.

[12]  V. Javier Traver,et al.  Making service robots human-safe , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[13]  A. ZelinskyResearch,et al.  The Essential Components of Human-Friendly Robot Systems , 1999 .

[14]  Anca D. Dragan,et al.  Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration , 2016, AAMAS.

[15]  Dmitry Berenson,et al.  Predicting human reaching motion in collaborative tasks using Inverse Optimal Control and iterative re-planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Tanja Schultz,et al.  Combined intention, activity, and motion recognition for a humanoid household robot , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Ashwin P. Dani,et al.  Human Intention Inference Using Expectation-Maximization Algorithm With Online Model Learning , 2017, IEEE Transactions on Automation Science and Engineering.

[18]  Dmitry Berenson,et al.  Unsupervised early prediction of human reaching for human–robot collaboration in shared workspaces , 2018, Auton. Robots.

[19]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[20]  Matthias Vogelgesang,et al.  Multimodal integration of natural gaze behavior for intention recognition during object manipulation , 2009, ICMI-MLMI '09.

[21]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[22]  Rahul B. Warrier,et al.  Inferring Intent for Novice Human-in-the-Loop Iterative Learning Control , 2017, IEEE Transactions on Control Systems Technology.

[23]  Siddhartha S. Srinivasa,et al.  Toward seamless human-robot handovers , 2013, Journal of Human-Robot Interaction.

[24]  Rajesh P. N. Rao,et al.  Gaze Following as Goal Inference: A Bayesian Model , 2011, CogSci.

[25]  Carlos Morato,et al.  Toward Safe Human Robot Collaboration by Using Multiple Kinects Based Real-Time Human Tracking , 2014, J. Comput. Inf. Sci. Eng..

[26]  B. Scassellati,et al.  Social eye gaze in human-robot interaction , 2017, HRI 2017.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Olaf Stursberg,et al.  Human arm motion modeling and long-term prediction for safe and efficient Human-Robot-Interaction , 2011, 2011 IEEE International Conference on Robotics and Automation.

[29]  Jos Elfring,et al.  Learning intentions for improved human motion prediction , 2013, 2013 16th International Conference on Advanced Robotics (ICAR).

[30]  Masayoshi Tomizuka,et al.  Ensuring safety in human-robot coexistence environment , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  G. Gredebäck,et al.  Eye Movements During Action Observation , 2015, Perspectives on psychological science : a journal of the Association for Psychological Science.

[32]  Dana Kulic,et al.  Affective State Estimation for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[33]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[34]  Ashwin P. Dani,et al.  Learning Partially Contracting Dynamical Systems from Demonstrations , 2017, CoRL.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[37]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[38]  Siddhartha S. Mehta,et al.  Information fusion in human-robot collaboration using neural network representation , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[39]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Thiagalingam Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation , 2001 .

[41]  Ashwin P. Dani,et al.  Bayesian human intention inference through multiple model filtering with gaze-based priors , 2016, 2016 19th International Conference on Information Fusion (FUSION).

[42]  Jodie A. Baird,et al.  Discerning intentions in dynamic human action , 2001, Trends in Cognitive Sciences.

[43]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[44]  Karen M. Feigh,et al.  Predicting Task Intent From Surface Electromyography Using Layered Hidden Markov Models , 2017, IEEE Robotics and Automation Letters.

[45]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[46]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[47]  Winfried Stefan Lohmiller,et al.  Contraction analysis of nonlinear systems , 1999 .

[48]  C. Kleinke Gaze and eye contact: a research review. , 1986, Psychological bulletin.

[49]  Ashwin P. Dani,et al.  Human intention inference through interacting multiple model filtering , 2015, 2015 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[50]  Darius Burschka,et al.  Predicting human intention in visual observations of hand/object interactions , 2013, 2013 IEEE International Conference on Robotics and Automation.

[51]  Peter Willett,et al.  Systematic approach to IMM mixing for unequal dimension states , 2015, IEEE Transactions on Aerospace and Electronic Systems.

[52]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[53]  Gwen Littlewort,et al.  Real Time Face Detection and Facial Expression Recognition: Development and Applications to Human Computer Interaction. , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[54]  W. Buxton Human-Computer Interaction , 1988, Springer Berlin Heidelberg.