Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration

Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. A common approach to human intention inference is to model specific trajectories towards known goals with supervised classifiers. However, these approaches do not take possible future movements into account nor do they make use of kinematic cues, such as legible and predictable motion. The bottleneck of these methods is the lack of an accurate model of general human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motions. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.

[1]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[2]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Dmitry Berenson,et al.  Human-robot collaborative manipulation planning using early prediction of human motion , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[5]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[6]  Jan Peters,et al.  Anticipative Interaction Primitives for Human-Robot Collaboration , 2016, AAAI Fall Symposia.

[7]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Oliver Kroemer,et al.  Interaction primitives for human-robot cooperation tasks , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Hema Swetha Koppula,et al.  Anticipatory Planning for Human-Robot Teams , 2014, ISER.

[10]  M. Candidi,et al.  Kinematics fingerprints of leader and follower role-taking during cooperative joint actions , 2013, Experimental Brain Research.

[11]  Dmitry Berenson,et al.  A framework for unsupervised online human reaching motion recognition and early prediction , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  L. Wheaton,et al.  I give you a cup, I get a cup: A kinematic study on social intention , 2014, Neuropsychologia.

[16]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[18]  U. Castiello,et al.  Does the intention to communicate affect action kinematics? , 2009, Consciousness and Cognition.

[19]  Giovanni Pezzulo,et al.  What should I do next? Using shared representations to solve interaction problems , 2011, Experimental Brain Research.

[20]  Siddhartha S. Srinivasa,et al.  Integrating human observer inferences into robot motion planning , 2014, Auton. Robots.

[21]  Julie A. Shah,et al.  Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Siddhartha S. Srinivasa,et al.  Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).