Detect, anticipate and generate: Semi-supervised recurrent latent variable models for human activity modeling

Successful Human-Robot collaboration requires a predictive model of human behavior. The robot needs to be able to recognize current goals and actions and to predict future activities in a given context. However, the spatio-temporal sequence of human actions is difficult to model since latent factors such as intention, task, knowledge, intuition and preference determine the action choices of each individual. In this work we introduce semi-supervised variational recurrent neural networks which are able to a) model temporal distributions over latent factors and the observable feature space, b) incorporate discrete labels such as activity type when available, and c) generate possible future action sequences on both feature and label level. We evaluate our model on the Cornell Activity Dataset CAD-120 dataset. Our model outperforms state-of-the-art approaches in both activity and affordance detection and anticipation. Additionally, we show how samples of possible future action sequences are in line with past observations.

[1]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[2]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[5]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[7]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[10]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[11]  Stefanos Nikolaidis,et al.  Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[12]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.