A Variational Graph Autoencoder for Manipulation Action Recognition and Prediction

Despite decades of research, understanding human manipulation activities is, and has always been, one of the most attractive and challenging research topics in computer vision and robotics. Recognition and prediction of observed human manipulation actions have their roots in the applications related to, for instance, human-robot interaction and robot learning from demonstration. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. These networks, however, come with immense computational complexity to be able to process high dimensional raw data. Different from the related works, we here introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs, instead of relying on the structured Euclidean data. Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs. The input of the proposed network is a set of semantic graphs which store the spatial relations between subjects and objects in the scene. The network output is a label set representing the detected and predicted class types. We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance. We also release our source code https://github.com/gamzeakyol/GNet.

[1]  Guangchun Cheng,et al.  Advances in Human Action Recognition: A Survey , 2015, ArXiv.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[4]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Mirko Wächter,et al.  Learning Object-Action Relations from Bimanual Human Demonstration Using Graph Networks , 2019, IEEE Robotics and Automation Letters.

[7]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Davide Bacciu,et al.  A Fair Comparison of Graph Neural Networks for Graph Classification , 2020, ICLR.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Vijay Gadepally,et al.  Video Action Understanding: A Tutorial , 2020, ArXiv.

[11]  Phi Vu Tran,et al.  Multi-Task Graph Autoencoders , 2018, ArXiv.

[12]  Eren Erdal Aksoy,et al.  Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences , 2016, International Journal of Computer Vision.

[13]  Jian Tang,et al.  InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization , 2019, ICLR.

[14]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[15]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[16]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[17]  Eren Erdal Aksoy,et al.  Model-free incremental learning of the semantics of manipulation actions , 2015, Robotics Auton. Syst..

[18]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shuo Yang,et al.  Learning Actions from Human Demonstration Video for Robotic Manipulation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Lina Yao,et al.  Adversarially Regularized Graph Autoencoder , 2018, IJCAI.

[21]  Shrey Dabhi,et al.  NodeNet: A Graph Regularised Neural Network for Node Classification , 2020, ArXiv.

[22]  Qian Xu,et al.  Graph Random Neural Networks for Semi-Supervised Learning on Graphs , 2020, NeurIPS.

[23]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[24]  Trevor Darrell,et al.  Spatio-Temporal Action Graph Networks , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[25]  Shuping Xiong,et al.  A Novel Hybrid Deep Neural Network to Predict Pre-impact Fall for Older People Based on Wearable Inertial Sensors , 2020, Frontiers in Bioengineering and Biotechnology.

[26]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[28]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[29]  Manohar Kaul,et al.  Few-Shot Learning on Graphs via Super-Classes based on Graph Spectral Measures , 2020, ICLR.

[30]  Bolei Zhou,et al.  Temporal Relational Reasoning in Videos , 2017, ECCV.

[31]  Roman Garnett,et al.  Efficient Graph Kernels by Randomization , 2012, ECML/PKDD.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Trevor Darrell,et al.  Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[35]  Fei Gao,et al.  Link Prediction via Graph Attention Network , 2019, ArXiv.

[36]  Juan Carlos Niebles,et al.  Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Roman Garnett,et al.  Propagation kernels: efficient graph kernels from propagated information , 2015, Machine Learning.

[38]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[39]  J. Mairal,et al.  Convolutional Kernel Networks for Graph-Structured Data , 2020, ICML.