论文信息 - Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes

Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes

Path forecasting is a pivotal step toward understanding dynamic scenes and an emerging topic in the computer vi- sion field. This task is challenging due to the multimodal nature of the future, namely, given a partial history, there is more than one plausible prediction. Yet, the state-of-the-art methods seem not fully responsive to this innate variabil- ity. Hence, how to better foresee the forthcoming trajectory in dynamic scenes has to be more thoroughly pursued. To this end, we propose a novel Imitative Decision Learning (IDL) approach. It delves deeper into the key that inher- ently characterizes the multimodality – the latent decision. The proposed IDL first infers the distribution of such latent decisions by learning from moving histories. A policy is then generated by taking the sampled latent decision into account to predict the future. Different plausible upcoming paths corresponds to each sampled latent decision. This ap- proach significantly differs from the mainstream literature that relies on a predefined latent variable to extrapolate di- verse predictions. In order to augment the understanding of the latent decision and resultant mutimodal future, we in- vestigate their connection through mutual information op- timization. Moreover, the proposed IDL integrates spatial and temporal dependencies into one single framework, in contrast to handling them with two-step settings. As a re- sult, our approach enables simultaneous anticipation of the paths of all pedestrians in the scene. We assess our pro- posal on the large-scale SAP, ETH and UCY datasets. The experiments show that IDL introduces considerable margin improvements with respect to recent leading studies.

Yuke Li | Yuke Li

[1] Shu Liu,et al. Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Philip H. S. Torr,et al. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Martial Hebert,et al. Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Gregory D. Hager,et al. Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Yutaka Satoh,et al. Anticipating Traffic Accidents with Adaptive Loss and Large-Scale Incident DB , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Stefano Ermon,et al. InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[7] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8] Yuke Li,et al. Pedestrian Path Forecasting in Crowd: A Deep Spatio-Temporal Perspective , 2017, ACM Multimedia.

[9] Sinisa Todorovic,et al. Temporal Deformable Residual Networks for Action Segmentation in Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[11] Yann LeCun,et al. Predicting Future Instance Segmentations by Forecasting Convolutional Features , 2018, ECCV.

[12] Bingbing Ni,et al. Structure Preserving Video Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Yuke Li,et al. A Deep Spatiotemporal Perspective for Understanding Crowd Behavior , 2018, IEEE Transactions on Multimedia.

[15] Yun Fu,et al. Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[16] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[17] Zhanxing Zhu,et al. Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[18] Silvio Savarese,et al. Knowledge Transfer for Scene-Specific Motion Prediction , 2016, ECCV.

[19] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[21] Bin Yang,et al. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Roger B. Grosse,et al. Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[23] Cyrus Shahabi,et al. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[24] Bernhard Schölkopf,et al. Wasserstein Auto-Encoders , 2017, ICLR.

[25] Xiaogang Wang,et al. Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[26] Shenghua Gao,et al. Encoding Crowd Interaction with Deep Neural Network for Pedestrian Trajectory Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[29] Jin Young Choi,et al. Visual Path Prediction in Complex Scenes with Crowded Moving Objects , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Kris M. Kitani,et al. Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Silvio Savarese,et al. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Lawrence Carin,et al. Symmetric Variational Autoencoder and Connections to Adversarial Learning , 2017, AISTATS.

[33] Silvio Savarese,et al. Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[34] Jianbo Shi,et al. Predicting Behaviors of Basketball Players from First Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37] Yuke Li,et al. Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency , 2018, ACM Multimedia.

[38] Yoshua Bengio,et al. Mutual Information Neural Estimation , 2018, ICML.

[39] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.

[40] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[41] Fei-Fei Li,et al. Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Silvio Savarese,et al. Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[44] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[45] Nicholas Rhinehart,et al. First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Sebastian Nowozin,et al. Which Training Methods for GANs do actually Converge? , 2018, ICML.

[47] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Pengpeng Zhao,et al. LC-RNN: A Deep Learning Model for Traffic Speed Prediction , 2018, IJCAI.

[49] Luc Van Gool,et al. You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.