Human Motion Generation via Cross-Space Constrained Sampling

We aim to automatically generate human motion sequence from a single input person image, with some specific action label. To this end, we propose a cross-space human motion video generation network which features two paths: a forward path that first samples/generates a sequence of low dimensional motion vectors based on Gaussian Process (GP), which is paired with the input person image to form a moving human figure sequence; and a backward path based on the predicted human images to re-extract the corresponding latent motion representations. As lack of supervision, the reconstructed latent motion representations are expected to be as close as possible to the GP sampled ones, thus yielding a cyclic objective function for cross-space (i.e., motion and appearance) mutual constrained generation. We further propose an alternative sampling/generation algorithm with respect to constraints from both spaces. Extensive experimental results show that the proposed framework successfully generates novel human motion sequences with reasonable visual quality.

[1]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[3]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[4]  Sankar K. Pal,et al.  Pattern Recognition and Machine Intelligence , 2015, Lecture Notes in Computer Science.

[5]  Richard Szeliski,et al.  Video textures , 2000, SIGGRAPH.

[6]  International Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2008, Los Angeles, California, USA, August 11-15, 2008, Computer Animation Festival , 2008, SIGGRAPH Computer Animation Festival.

[7]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Mohammed Bennamoun,et al.  A Gaussian Process Guided Particle Filter for Tracking 3D Human Pose in Video , 2013, IEEE Transactions on Image Processing.

[9]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Martial Hebert,et al.  The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).