论文信息 - A Neural Temporal Model for Human Motion Prediction

A Neural Temporal Model for Human Motion Prediction

We propose novel neural temporal models for predicting and synthesizing human motion, achieving state-of-the-art in modeling long-term motion trajectories while being competitive with prior work in short-term prediction and requiring significantly less computation. Key aspects of our proposed system include: 1) a novel, two-level processing architecture that aids in generating planned trajectories, 2) a simple set of easily computable features that integrate derivative information, and 3) a novel multi-objective loss function that helps the model to slowly progress from simple next-step prediction to the harder task of multi-step, closed-loop prediction. Our results demonstrate that these innovations improve the modeling of long-term motion trajectories. Finally, we propose a novel metric, called Normalized Power Spectrum Similarity (NPSS), to evaluate the long-term predictive ability of motion synthesis models, complementing the popular mean-squared error (MSE) measure of Euler joint angles over time. We conduct a user study to determine if the proposed NPSS correlates with human evaluation of long-term motion more strongly than MSE and find that it indeed does. We release code and additional results (visualizations) for this paper at: https://github.com/cr7anand/neural_temporal_models

[1] Yoshua Bengio,et al. Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[2] Christopher Kermorvant,et al. Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[3] Lucas Kovar,et al. Motion graphs , 2002, SIGGRAPH Classes.

[4] Lucas Kovar,et al. Motion Graphs , 2002, ACM Trans. Graph..

[5] David J. Fleet,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[6] Michael J. Black,et al. Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[7] Alexander Ororbia,et al. Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[8] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[9] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10] Michael J. Black,et al. On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[12] David J. Fleet,et al. Erratum: "Gaussian process dynamical models for human motion" (IEEE Transactions on Pattern analysis and Machine Intelligenc (292)) , 2008 .

[13] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[14] Daniel Thalmann,et al. A global human walking model with real-time kinematic personification , 1990, The Visual Computer.

[15] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16] David Reitter,et al. Learning Simpler Language Models with the Differential State Framework , 2017, Neural Computation.

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Yi Zhou,et al. Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis , 2017, ICLR.

[19] Trevor Darrell,et al. Fast contour matching using approximate earth mover's distance , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20] Yale Song,et al. Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Vladimir Pavlovic,et al. Learning Switching Linear Models of Human Motion , 2000, NIPS.

[24] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[25] Joelle Pineau,et al. Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[26] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Otmar Hilliges,et al. Learning Human Motion Models for Long-Term Predictions , 2017, 2017 International Conference on 3D Vision (3DV).

[28] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[29] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[30] Jitendra Malik,et al. Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Sergey Levine,et al. Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[32] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.