A Multimodal and Hybrid Framework for Human Navigational Intent Inference

Understanding human navigational intent is essential for robots to be able to interact with and navigate around humans safely and naturally. Current methods typically perform inference through only one mode of perception such as human motion trajectory, and a single theoretical framework such as a learning-based or classical approach. In contrast, this paper studies prediction of human navigational intent using multimodal perception within a hybrid framework. Our framework consists of two modules: a) a learning-based prediction module to predict a human’s future goal position, and b) a classical control theory-inspired reconstruction module to reconstruct a possible future trajectory or a set of possible future positions using the predicted future goal position. For the prediction module, we propose an end-to-end LSTM-CNN hybrid neural network for predicting a human’s future position in the real world, given human motion, human body pose and head orientation. This visual information from an egocentric perspective is used to make predictions of a human’s future position in world space, essential for robotic navigation algorithms and planning. In the reconstruction module, we present two control theoretic methods to reconstruct possible future trajectories of human: trajectory generation for differentially flat system and reachability analysis. We evaluate the performance of our framework on a newly collected dataset called SFU-Store-Nav. Experimental results reveal that our method outperforms various baselines especially when a relatively small amount of data is available.

[1]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[2]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[3]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[4]  Raúl Quintero,et al.  Pedestrian path prediction based on body language and action classification , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[5]  Wei Liu,et al.  Deep Learning Driven Visual Path Prediction From a Single Image , 2016, IEEE Transactions on Image Processing.

[6]  Taher Ahmadi,et al.  SFU-store-nav: A multimodal dataset for indoor human navigation , 2020, Data in brief.

[7]  Julie A. Shah,et al.  Human-robot co-navigation using anticipatory indicators of human walking motion , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Anca D. Dragan,et al.  Probabilistically Safe Robot Planning with Confidence-Based Human Predictions , 2018, Robotics: Science and Systems.

[9]  Dinesh Manocha,et al.  TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Abduallah A. Mohamed,et al.  Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stan Sclaroff,et al.  Learning Activity Progression in LSTMs for Activity Detection and Early Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dariu Gavrila,et al.  Context-Based Pedestrian Path Prediction , 2014, ECCV.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Henggang Cui,et al.  Short-term Motion Prediction of Traffic Actors for Autonomous Driving using Deep Convolutional Networks , 2018 .

[15]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Dariu M. Gavrila,et al.  Human motion trajectory prediction: a survey , 2019, Int. J. Robotics Res..

[18]  Somil Bansal,et al.  Generating Robust Supervision for Learning-Based Visual Navigation Using Hamilton-Jacobi Reachability , 2020, L4DC.

[19]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yoichi Sato,et al.  Future Person Localization in First-Person Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[23]  Kris M. Kitani,et al.  Forecasting Interactive Dynamics of Pedestrians with Fictitious Play , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[25]  Juan Carlos Niebles,et al.  Peeking Into the Future: Predicting Future Person Activities and Locations in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Hema Swetha Koppula,et al.  Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[29]  Kai Oliver Arras,et al.  Human Motion Prediction Under Social Grouping Constraints , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Mo Chen,et al.  Hamilton-Jacobi reachability: A brief overview and recent advances , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[31]  Hannes Sommer,et al.  A Data-driven Model for Interaction-Aware Pedestrian Motion Prediction in Object Cluttered Environments , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Dmitry Berenson,et al.  Goal Set Inverse Optimal Control and Iterative Replanning for Predicting Human Reaching Motions in Shared Workspaces , 2016, IEEE Transactions on Robotics.

[33]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[34]  Dinesh Manocha,et al.  GLMP- realtime pedestrian path prediction using global and local movement patterns , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Miguel Ángel Sotelo,et al.  Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical Models and Pedestrian Activity Recognition , 2019, IEEE Transactions on Intelligent Transportation Systems.

[36]  Zhitian Zhang,et al.  Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference , 2020, ICMI.

[37]  Richard M. Murray,et al.  Real Time Trajectory Generation for Differentially Flat Systems , 1996 .

[38]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.