Graph-based Normalizing Flow for Human Motion Generation and Reconstruction

Data-driven approaches for modeling human skeletal motion have found various applications in interactive media and social robotics. Challenges remain in these fields for generating high-fidelity samples and robustly reconstructing motion from imperfect input data, due to e.g. missed marker detection. In this paper, we propose a probabilistic generative model to synthesize and reconstruct long horizon motion sequences conditioned on past information and control signals, such as the path along which an individual is moving. Our method adapts the existing work MoGlow by introducing a new graph-based model. The model leverages the spatial-temporal graph convolutional network (ST-GCN) to effectively capture the spatial structure and temporal correlation of skeletal motion data at multiple scales. We evaluate the models on a mixture of motion capture datasets of human locomotion with foot-step and bone-length analysis. The results demonstrate the advantages of our model in reconstructing missing markers and achieving comparable results on generating realistic future poses. When the inputs are imperfect, our model shows improvements on robustness of generation.

[1]  Lin Gao,et al.  Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition , 2019, AAAI.

[2]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Siddhartha Chaudhuri,et al.  A Deep Recurrent Framework for Cleaning Motion Capture Data , 2017, ArXiv.

[4]  Yi Zhou,et al.  Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis , 2017, ICLR.

[5]  Yupeng Li,et al.  A Deep Bi-directional Attention Network for Human Motion Recovery , 2019, IJCAI.

[6]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[7]  Øyvind Gløersen,et al.  Predicting Missing Marker Trajectories in Human Motion Data Using Marker Intercorrelations , 2016, PloS one.

[8]  Carlos Busso,et al.  Novel Realizations of Speech-Driven Head Movements with Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Tao Jiang,et al.  Human motion data refinement unitizing structural sparsity and spatial-temporal information , 2016, 2016 IEEE 13th International Conference on Signal Processing (ICSP).

[10]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[11]  Dahua Lin,et al.  Convolutional Sequence Generation for Skeleton-Based Action Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Hans-Peter Seidel,et al.  Efficient and Robust Annotation of Motion Capture Data , 2009 .

[13]  Songhwai Oh,et al.  Generative Autoregressive Networks for 3D Dancing Move Synthesis From Music , 2019, IEEE Robotics and Automation Letters.

[14]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jonas Beskow,et al.  Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows , 2020, Comput. Graph. Forum.

[16]  Stephen D. Laycock,et al.  Predicting Head Pose from Speech with a Conditional Variational Autoencoder , 2017, INTERSPEECH.

[17]  Rushil Anirudh,et al.  Recovering Trajectories of Unmarked Joints in 3D Human Actions Using Latent Space Optimization , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[18]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[19]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jonas Beskow,et al.  MoGlow , 2019, ACM Trans. Graph..

[21]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[23]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Jonas Beskow,et al.  A neural network approach to missing marker reconstruction in human motion capture , 2018, 1803.02665.

[25]  P. J. Narayanan,et al.  Part-based Graph Convolutional Network for Action Recognition , 2018, BMVC.

[26]  Takayuki Kanda,et al.  Destination Unknown: Walking Side-by-Side without Knowing the Goal , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Yong Dou,et al.  Pose-Forecasting Aided Human Video Prediction With Graph Convolutional Networks , 2020, IEEE Access.

[28]  Junhui Hou,et al.  Human motion capture data recovery using trajectory-based matrix completion , 2013 .

[29]  Yu Tian,et al.  Semantic Graph Convolutional Networks for 3D Human Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Fangkai Yang,et al.  Group Behavior Recognition Using Attention- and Graph-Based Neural Networks , 2020, ECAI.

[31]  Taku Komura,et al.  A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[32]  Zhao Wang,et al.  Motion Capture Data Completion via Truncated Nuclear Norm Regularization , 2018, IEEE Signal Processing Letters.

[33]  Gustav Eje Henter,et al.  Gesticulator: A framework for semantically-aware speech-driven gesture generation , 2020, ICMI.

[34]  M. A. Brubaker,et al.  Probabilistic Character Motion Synthesis using a Hierarchical Deep Latent Variable Model , 2020, Comput. Graph. Forum.

[35]  Aviral Kumar,et al.  Graph Normalizing Flows , 2019, NeurIPS.

[36]  Zhiyong Wang,et al.  Combining Recurrent Neural Networks and Adversarial Training for Human Motion Synthesis and Control , 2018, IEEE Transactions on Visualization and Computer Graphics.

[37]  Fangkai Yang,et al.  Impact of Trajectory Generation Methods on Viewer Perception of Robot Approaching Group Behaviors , 2020, 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[38]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Huaijiang Sun,et al.  Nonlocal low-rank regularization for human motion recovery based on similarity analysis , 2019, Inf. Sci..

[40]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).