Pedestrian Intention Prediction Based on Traffic-Aware Scene Graph Model

Anticipating the future behavior of pedestrians is a crucial part of deploying Automated Driving Systems (ADS) in urban traffic scenarios. Most recent works utilize a convolutional neural network (CNN) to extract visual information, which is then input to a recurrent neural network (RNN) along with pedestrian-specific features like location and speed to obtain temporal features. However, the majority of these approaches lack the ability to parse the relationships of the related objects in the specific traffic scene, which leads to omitting the interactions between the pedestrians and the interactions between the pedestrians and the traffic. For this purpose, we propose a graph-structured model which can dig out pedestrians' dynamic constraints by constructing a traffic-aware scene graph within each frame. In addition, to capture pedestrian movement more effectively, we also introduce a temporal feature representation model, which first uses inter-frame and intra-frame GRU (II-GRU) to mine inter-frame information and intra-frame information together, and then employs a novel attention mechanism to adaptively generate attention weights. Extensive experiments on the JAAD and PIE datasets prove that our proposed model is effective in reaching and enhancing the state-of-the-art performance.

[1]  Dazhi Zhang,et al.  Pedestrian Intention Prediction via Depth Augmented Scene Restoration , 2021, 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI).

[2]  Xinge You,et al.  View Vertically: A Hierarchical Network for Trajectory Prediction via Fourier Spectrums , 2021, ECCV.

[3]  Ankur Singh,et al.  Multi-Input Fusion for Practical Pedestrian Intention Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[4]  Fabien Moutarde,et al.  TrouSPI-Net: Spatio-temporal attention on parallel atrous convolutions and U-GRUs for skeletal pedestrian crossing prediction , 2021, 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021).

[5]  Ignacio Parra,et al.  CAPformer: Pedestrian Crossing Action Prediction Using Transformer , 2021, Sensors.

[6]  Fabien Moutarde,et al.  Asymmetrical Bi-RNN for pedestrian trajectory encoding , 2021, ArXiv.

[7]  J. Lorenzo,et al.  IntFormer: Predicting pedestrian intention with the aid of the Transformer architecture , 2021, ArXiv.

[8]  Ella M. Atkins,et al.  Coupling Intent and Action for Pedestrian Crossing Behavior Prediction , 2021, IJCAI.

[9]  Umit Ozguner,et al.  Predicting Pedestrian Crossing Intention With Feature Fusion and Spatio-Temporal Attention , 2021, IEEE Transactions on Intelligent Vehicles.

[10]  Amir Rasouli,et al.  Benchmark for Evaluating Pedestrian Action Prediction , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Jun Luo,et al.  Bifold and Semantic Reasoning for Pedestrian Behavior Prediction , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  N. Papanikolopoulos,et al.  Estimating Pedestrian Crossing States Based on Single 2D Body Pose , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Smail Ait Bouhsain,et al.  Pedestrian Intention Prediction: A Multi-task Perspective , 2020, ArXiv.

[14]  Amir Rasouli,et al.  Do They Want to Cross? Understanding Pedestrian Intention for Behavior Prediction , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[15]  Hwasoo Yeo,et al.  A Real-time Vision Framework for Pedestrian Behavior Recognition and Intention Prediction at Intersections Using 3D Pose Estimation , 2020, ArXiv.

[16]  Ignacio Parra,et al.  RNN-based Pedestrian Crossing Prediction using Activity and Pose-related Features , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[17]  L. Srikar Muppirisetty,et al.  FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network , 2020, 2020 54th Asilomar Conference on Signals, Systems, and Computers.

[18]  John K. Tsotsos,et al.  Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs , 2020, BMVC.

[19]  Behzad Dariush,et al.  TITAN: Future Forecast Using Action Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Juan Carlos Niebles,et al.  Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction , 2020, IEEE Robotics and Automation Letters.

[21]  Antonio M. López,et al.  Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation , 2019, IEEE Transactions on Intelligent Transportation Systems.

[22]  John K. Tsotsos,et al.  PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Silvio Savarese,et al.  Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks , 2019, NeurIPS.

[24]  John K. Tsotsos,et al.  It's Not All About Size: On the Role of Data Properties in Pedestrian Detection , 2018, ECCV Workshops.

[25]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[26]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Justin Dauwels,et al.  Context based pedestrian intention prediction using factored latent dynamic conditional random fields , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[28]  John K. Tsotsos,et al.  Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[29]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Patrick Heinemann,et al.  Context-based detection of pedestrian crossing intention for autonomous driving in urban environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[35]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[36]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Eike Rehder,et al.  Head detection and orientation estimation for pedestrian safety , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[42]  C. G. Keller,et al.  Will the Pedestrian Cross? A Study on Pedestrian Path Prediction , 2014, IEEE Transactions on Intelligent Transportation Systems.

[43]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.