Predicting human intention in visual observations of hand/object interactions

The main contribution of this paper is a probabilistic method for predicting human manipulation intention from image sequences of human-object interaction. Predicting intention amounts to inferring the imminent manipulation task when human hand is observed to have stably grasped the object. Inference is performed by means of a probabilistic graphical model that encodes object grasping tasks over the 3D state of the observed scene. The 3D state is extracted from RGB-D image sequences by a novel vision-based, markerless hand-object 3D tracking framework. To deal with the high-dimensional state-space and mixed data types (discrete and continuous) involved in grasping tasks, we introduce a generative vector quantization method using mixture models and self-organizing maps. This yields a compact model for encoding of grasping actions, able of handling uncertain and partial sensory data. Experimentation showed that the model trained on simulated data can provide a potent basis for accurate goal-inference with partial and noisy observations of actual real-world demonstrations. We also show a grasp selection process, guided by the inferred human intention, to illustrate the use of the system for goal-directed grasp imitation.

[1]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[2]  Ran,et al.  The correspondence problem , 1998 .

[3]  D. Wolpert,et al.  Mental state inference using visual control parameters. , 2005, Brain research. Cognitive brain research.

[4]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5]  Helge J. Ritter,et al.  Robust tracking of human hand postures for robot teaching , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Danica Kragic,et al.  Multivariate discretization for Bayesian Network structure learning in robot grasping , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  I. Ebert‐Uphoff A Probability-Based Approach to Soft Discretization for Bayesian Networks , 2009 .

[8]  Rajesh P. N. Rao,et al.  Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .

[9]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[10]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[11]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[12]  Chun-Yi Lin,et al.  An Intelligent Model Based on Fuzzy Bayesian Networks to Predict Astrocytoma Malignant Degree , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[13]  Darius Burschka,et al.  Rigid 3D geometry matching for grasping of known objects in cluttered scenes , 2012, Int. J. Robotics Res..

[14]  Darius Burschka,et al.  An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes , 2010, ACCV.

[15]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[17]  Danica Kragic,et al.  Embodiment-specific representation of robot grasping using graphical models and latent-space discretization , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Rajesh P. N. Rao,et al.  Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans , 2009, Creating Brain-Like Intelligence.

[19]  Lawrence D. Fu,et al.  A Comparison of Bayesian Network Learning Algorithms from Continuous Data , 2005, AMIA.

[20]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[21]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[22]  Danica Kragic,et al.  Visual recognition of grasps for human-to-robot mapping , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[24]  Philippe Leray,et al.  BNT STRUCTURE LEARNING PACKAGE : Documentation and Experiments , 2004 .

[25]  Etienne E. Kerre,et al.  Defuzzification: criteria and classification , 1999, Fuzzy Sets Syst..

[26]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[27]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[28]  Rajesh P. N. Rao,et al.  Goal-Based Imitation as Probabilistic Inference over Graphical Models , 2005, NIPS.

[29]  Danica Kragic,et al.  Learning task constraints for robot grasping using graphical models , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  J. Greeno Gibson's affordances. , 1994, Psychological review.

[31]  Bernt Schiele,et al.  Functional Object Class Detection Based on Learned Affordance Cues , 2008, ICVS.

[32]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[33]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[34]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[35]  Kai Huebner BADGr - A toolbox for box-based approximation, decomposition and GRasping , 2012, Robotics Auton. Syst..