M2P3: multimodal multi-pedestrian path prediction by self-driving cars with egocentric vision

Accurate prediction of the future position of pedestrians in traffic scenarios is required for safe navigation of an autonomous vehicle but remains a challenge. This concerns, in particular, the effective and efficient multimodal prediction of most likely trajectories of tracked pedestrians from egocentric view of self-driving car. In this paper, we present a novel solution, named M2P3, which combines a conditional variational autoencoder with recurrent neural network encoder-decoder architecture in order to predict a set of possible future locations of each pedestrian in a traffic scene. The M2P3 system uses a sequence of RGB images delivered through an internal vehicle-mounted camera for egocentric vision. It takes as an input only two modes, that are past trajectories and scales of pedestrians, and delivers as an output the three most likely paths for each tracked pedestrian. Experimental evaluation of the proposed architecture on the JAAD and ETH/UCY datasets reveal that the M2P3 system is significantly superior to selected state-of-the-art solutions.

[1]  Jianbo Shi,et al.  First Person Action-Object Detection with EgoNet , 2016, Robotics: Science and Systems.

[2]  Dinesh Manocha,et al.  TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents , 2018, AAAI.

[3]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[4]  Christian Laugier,et al.  Exploiting map information for driver intention estimation at road intersections , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[5]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[6]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[9]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[10]  Ping Guo,et al.  Stochastic trajectory prediction with social graph network , 2019, ArXiv.

[11]  John K. Tsotsos,et al.  Agreeing to cross: How drivers and pedestrians communicate , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[12]  Nicholas Rhinehart,et al.  First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[14]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Mark J. F. Gales,et al.  Rao-Blackwellised Gibbs sampling for switching linear dynamical systems , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Tim J. Ellis,et al.  Spatial and Probabilistic Modelling of Pedestrian Behaviour , 2002, BMVC.

[18]  Giovanni Maria Farinella,et al.  Next-active-object prediction from egocentric videos , 2017, J. Vis. Commun. Image Represent..

[19]  S. Savarese,et al.  Learning an Image-Based Motion Context for Multiple People Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Alexandre Alahi,et al.  Human Trajectory Prediction using Adversarial Loss , 2019 .

[21]  Dariu M. Gavrila,et al.  Human motion trajectory prediction: a survey , 2019, Int. J. Robotics Res..

[22]  Yoichi Sato,et al.  A scalable approach for understanding the visual structures of hand grasps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Antonio M. López,et al.  Is the Pedestrian going to Cross? Answering by 2D Pose Estimation , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[25]  Katerina Fragkiadaki,et al.  Motion Prediction Under Multimodality with Conditional Stochastic Networks , 2017, ArXiv.

[26]  Stefan Lee,et al.  Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Sorin A. Huss,et al.  Predictive maneuver evaluation for enhancement of Car-to-X mobility data , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[28]  Hema A. Murthy,et al.  A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Dariu Gavrila,et al.  UvA-DARE ( Digital Academic Repository ) Pedestrian Path Prediction with Recursive Bayesian Filters : A Comparative Study , 2013 .

[30]  Jaime Lloret,et al.  Conditional Variational Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT , 2017, Sensors.

[31]  Jianbo Shi,et al.  Predicting Behaviors of Basketball Players from First Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yanyan Wang,et al.  Decade of Vision-Based Pedestrian Detection for Self-Driving: An Experimental Survey and Evaluation , 2018 .

[33]  James M. Rehg,et al.  Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[35]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michael S. Ryoo,et al.  Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos , 2018, ECCV.

[37]  John K. Tsotsos,et al.  Autonomous Vehicles That Interact With Pedestrians: A Survey of Theory and Practice , 2018, IEEE Transactions on Intelligent Transportation Systems.

[38]  Yong Jae Lee,et al.  Identifying First-Person Camera Wearers in Third-Person Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[40]  Kris M. Kitani,et al.  Hand parsing for fine-grained recognition of human grasps in monocular images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Juan Carlos Niebles,et al.  Peeking Into the Future: Predicting Future Person Activities and Locations in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jianbo Shi,et al.  Egocentric Future Localization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  L. Petersson,et al.  Monte Carlo based Threat Assessment: Analysis and Improvements , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[46]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[47]  Stefano Soatto,et al.  Intent-aware long-term prediction of pedestrian motion , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[48]  M. Ryoo,et al.  Forecasting Hand and Object Locations in Future Frames , 2017, ArXiv.

[49]  Bernt Schiele,et al.  Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[51]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Eric Sommerlade,et al.  Modelling pedestrian trajectory patterns with Gaussian processes , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[53]  John K. Tsotsos,et al.  Joint Attention in Autonomous Driving (JAAD) , 2016, ArXiv.

[54]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[55]  Shmuel Peleg,et al.  Egocentric Video Biometrics , 2014, ArXiv.

[56]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Qi Zhao,et al.  Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Ali Borji,et al.  Ego2Top: Matching Viewers in Egocentric and Top-View Videos , 2016, ECCV.

[59]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[60]  Yoichi Sato,et al.  Future Person Localization in First-Person Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Arun Ross,et al.  Forecasting Pedestrian Trajectory with Machine-Annotated Training Data , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[62]  Yu Yao,et al.  Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems , 2018, 2019 International Conference on Robotics and Automation (ICRA).