Multi-Objective Diverse Human Motion Prediction with Knowledge Distillation

Obtaining accurate and diverse human motion prediction is essential to many industrial applications, especially robotics and autonomous driving. Recent research has ex-plored several techniques to enhance diversity and maintain the accuracy of human motion prediction at the same time. However, most of them need to define a combined loss, such as the weighted sum of accuracy loss and diversity loss, and then decide their weights as hyperparameters before training. In this work, we aim to design a prediction frame-work that can balance the accuracy sampling and diversity sampling during the testing phase. In order to achieve this target, we propose a multi-objective conditional variational inference prediction model. We also propose a short-term oracle to encourage the prediction framework to explore more diverse future motions. We evaluate the performance of our proposed approach on two standard human motion datasets. The experiment results show that our approach is effective and on a par with state-of-the-art performance in terms of accuracy and diversity.

[1]  Wei Zhan,et al.  Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling , 2021, NeurIPS.

[2]  Masayoshi Tomizuka,et al.  Continual Multi-Agent Interaction Behavior Prediction With Conditional Generative Memory , 2021, IEEE Robotics and Automation Letters.

[3]  Masayoshi Tomizuka,et al.  Multi-Agent Driving Behavior Prediction across Different Scenarios with Self-Supervised Domain Knowledge , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[4]  Mathieu Salzmann,et al.  Generating Smooth Pose Sequences for Diverse Human Motion Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Masayoshi Tomizuka,et al.  RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Masayoshi Tomizuka,et al.  Spectral Temporal Graph Neural Network for Trajectory Prediction , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Anca D. Dragan,et al.  Analyzing Human Models that Adapt Online , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Somil Bansal,et al.  A Robust Control Framework for Human Motion Prediction , 2021, IEEE Robotics and Automation Letters.

[9]  Michael J. Black,et al.  We are More than Our Joints: Predicting how 3D Bodies Move , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bingbing Ni,et al.  Video Prediction via Example Guidance , 2020, ICML.

[11]  Chiho Choi,et al.  Shared Cross-Modal Trajectory Prediction for Autonomous Driving , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Masayoshi Tomizuka,et al.  EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning , 2020, NeurIPS.

[13]  Cristian Sminchisescu,et al.  Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows , 2020, ECCV.

[14]  Kris M. Kitani,et al.  DLow: Diversifying Latent Flows for Diverse Human Motion Prediction , 2020, ECCV.

[15]  Yanfeng Wang,et al.  Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Sunmin Lee,et al.  Learning predict-and-simulate policies from unorganized human motion data , 2019, ACM Trans. Graph..

[17]  Otmar Hilliges,et al.  Structured Prediction Helps 3D Human Motion Modelling , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Juan Carlos Niebles,et al.  Imitation Learning for Human Pose Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Bernt Schiele,et al.  Conditional Flow Variational Autoencoders for Structured Sequence Prediction , 2019, ArXiv.

[20]  Hongdong Li,et al.  Learning Trajectory Dependencies for Human Motion Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Lars Petersson,et al.  Learning Variations in Human Motion via Mix-and-Match Perturbation , 2019, ArXiv.

[22]  Kris Kitani,et al.  Diverse Trajectory Forecasting with Determinantal Point Processes , 2019, ICLR.

[23]  Iain Murray,et al.  Neural Spline Flows , 2019, NeurIPS.

[24]  Masayoshi Tomizuka,et al.  Wasserstein Generative Learning with Kinematic Constraints for Probabilistic Interactive Driving Behavior Prediction , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[25]  Pavel Zezula,et al.  Similarity Search in 3D Human Motion Data , 2019, ICMR.

[26]  Daniele Calandriello,et al.  Exact sampling of determinantal point processes with sublinear time preprocessing , 2019, NeurIPS.

[27]  Behzad Dariush,et al.  Looking to Relations for Future Trajectory Forecast , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Masayoshi Tomizuka,et al.  Conditional Generative Neural System for Probabilistic Trajectory Prediction , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Francesc Moreno-Noguer,et al.  Context-Aware Human Motion Prediction , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Alexander M. Rush,et al.  Latent Normalizing Flows for Discrete Sequences , 2019, ICML.

[31]  R. Venkatesh Babu,et al.  BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN , 2018, AAAI.

[32]  Paul Vernaza,et al.  r2p2: A ReparameteRized Pushforward Policy for Diverse, Precise Generative Path Forecasting , 2018, ECCV.

[33]  Ersin Yumer,et al.  MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics , 2018, ECCV.

[34]  Bernt Schiele,et al.  Accurate and Diverse Sampling of Sequences Based on a "Best of Many" Sample Objective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[36]  Xiao Lin,et al.  Human Motion Modeling using DVGANs , 2018, ArXiv.

[37]  Yaser Sheikh,et al.  Structure from Recurrent Motion: From Rigidity to Recurrency , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Zicheng Liu,et al.  HP-GAN: Probabilistic 3D Human Motion Prediction via GAN , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Tom White,et al.  Generative Adversarial Networks: An Overview , 2017, IEEE Signal Processing Magazine.

[40]  Ravi Kiran Sarvadevabhatla,et al.  DeLiGAN: Generative Adversarial Networks for Diverse and Limited Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[42]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Martial Hebert,et al.  The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[45]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[46]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[47]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[50]  Ben Taskar,et al.  Expectation-Maximization for Learning Determinantal Point Processes , 2014, NIPS.

[51]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[53]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[54]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[55]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[57]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[58]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[59]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[60]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[61]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[62]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH.

[63]  R. Zemel,et al.  UvA-DARE (Digital Academic Repository) Neural Relational Inference for Interacting Systems Neural Relational Inference for Interacting Systems , 2018 .

[64]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[65]  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[66]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .