NEMO: Future Object Localization Using Noisy Ego Priors

Predicting the future trajectory of agents from visual observations is an important problem for realization of safe and effective navigation of autonomous systems in dynamic environments. This paper focuses on two important aspects of future trajectory forecast which are particularly relevant for mobile platforms: 1) modeling uncertainty of the predictions, particularly from egocentric views, where uncertainty in the interactive reactions and behaviors of other agents must consider the uncertainty in the ego-motion, and 2) modeling multi-modality nature of the problem, which are particularly prevalent at junctions in urban traffic scenes. To address these problems in a unified approach, we propose NEMO (Noisy Ego MOtion priors for future object localization) for future forecast of agents in the egocentric view. In the proposed approach, a predictive distribution of future forecast is jointly modeled with the uncertainty of predictions. For this, we divide the problem into two tasks: future ego-motion prediction and future object localization. We first model the multi-modal distribution of future ego-motion with uncertainty estimates. The resulting distribution of ego-behavior is used to sample multiple modes of future ego-motion. Then, each modality is used as a prior to understand the interactions between the ego-vehicle and target agent. We predict the multi-modal future locations of the target from individual modes of the ego-vehicle while modeling the uncertainty of the target's behavior. To this end, we extensively evaluate the proposed framework using the publicly available benchmark dataset (HEV-I) supplemented with odometry data from an Inertial Measurement Unit (IMU).

[1]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[6]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[7]  Behzad Dariush,et al.  Looking to Relations for Future Trajectory Forecast , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[9]  Jianbo Shi,et al.  Predicting Behaviors of Basketball Players from First Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Brendan Tran Morris,et al.  Convolutional Neural Networkfor Trajectory Prediction , 2018, ECCV Workshops.

[13]  Christian Laugier,et al.  High-speed autonomous navigation with motion prediction for unknown moving obstacles , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  François Chaumette,et al.  Path planning for robust image-based control , 2002, IEEE Trans. Robotics Autom..

[15]  Yong Jae Lee,et al.  Identifying First-Person Camera Wearers in Third-Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael S. Ryoo,et al.  Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos , 2018, ECCV.

[18]  Srikanth Malla,et al.  SSP: Single Shot Future Trajectory Prediction , 2020, ArXiv.

[19]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[21]  Ali Borji,et al.  Ego2Top: Matching Viewers in Egocentric and Top-View Videos , 2016, ECCV.

[22]  Jianbo Shi,et al.  First Person Action-Object Detection with EgoNet , 2016, Robotics: Science and Systems.

[23]  Iwan Ulrich,et al.  VFH+: reliable obstacle avoidance for fast mobile robots , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[24]  Vincent Aravantinos,et al.  The Simpler the Better: Constant Velocity for Pedestrian Motion Prediction , 2019, ArXiv.

[25]  Yi-Ting Chen,et al.  The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[27]  Yu Yao,et al.  Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Bernt Schiele,et al.  Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Tsutomu Kimoto,et al.  Manipulator control with image-based visual servo , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[30]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[31]  Hanan Samet,et al.  Time-minimal paths among moving obstacles , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[32]  James M. Rehg,et al.  Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Srikanth Malla,et al.  DROGON: A Causal Reasoning Framework for Future Trajectory Forecast , 2019, ArXiv.

[34]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[35]  Yoichi Sato,et al.  Future Person Localization in First-Person Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[37]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[38]  Jianbo Shi,et al.  Egocentric Future Localization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).