Important Object Identification with Semi-Supervised Learning for Autonomous Driving

Accurate identification of important objects in the scene is a prerequisite for safe and high-quality decision making and motion planning of intelligent agents (e.g., autonomous vehicles) that navigate in complex and dynamic environments. Most existing approaches attempt to employ attention mechanisms to learn importance weights associated with each object indirectly via various tasks (e.g., trajectory prediction), which do not enforce direct supervision on the importance estimation. In contrast, we tackle this task in an explicit way and formulate it as a binary classification (“important” or “unimportant”) problem. We propose a novel approach for important object identification in egocentric driving scenarios with relational reasoning on the objects in the scene. Besides, since human annotations are limited and expensive to obtain, we present a semi-supervised learning pipeline to enable the model to learn from unlimited unlabeled data. Moreover, we propose to leverage the auxiliary tasks of ego vehicle behavior prediction to further improve the accuracy of importance estimation. The proposed approach is evaluated on a public egocentric driving dataset (H3D) collected in complex traffic scenarios. A detailed ablative study is conducted to demonstrate the effectiveness of each model component and the training strategy. Our approach also outperforms rule-based baselines by a large margin.

[1]  M. Tomizuka,et al.  Grouptron: Dynamic Multi-Scale Graph Convolutional Networks for Group-Aware Dense Crowd Trajectory Forecasting , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[2]  Masayoshi Tomizuka,et al.  Spatio-Temporal Graph Dual-Attention Network for Multi-Agent Prediction and Tracking , 2021, IEEE Transactions on Intelligent Transportation Systems.

[3]  Masayoshi Tomizuka,et al.  Continual Multi-Agent Interaction Behavior Prediction With Conditional Generative Memory , 2021, IEEE Robotics and Automation Letters.

[4]  Masayoshi Tomizuka,et al.  Multi-Agent Driving Behavior Prediction across Different Scenarios with Self-Supervised Domain Knowledge , 2021, 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).

[5]  Masayoshi Tomizuka,et al.  RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jungong Han,et al.  Deep Attentive Video Summarization With Distribution Consistency Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Chiho Choi,et al.  Shared Cross-Modal Trajectory Prediction for Autonomous Driving , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hongcheng Wang,et al.  VideoSSL: Semi-Supervised Learning for Video Classification , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Luc Van Gool,et al.  Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation , 2020, ECCV.

[10]  C. Hudelot,et al.  An Overview of Deep Semi-Supervised Learning , 2020, ArXiv.

[11]  Wei-Shi Zheng,et al.  Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Masayoshi Tomizuka,et al.  EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning , 2020, NeurIPS.

[13]  David J. Crandall,et al.  Interaction Graphs for Object Importance Estimation in On-road Driving Videos , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Stanley H. Chan,et al.  Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Ahmet Yazici,et al.  See, Attend and Brake: An Attention-based Saliency Map Prediction Model for End-to-End Driving , 2020, ArXiv.

[16]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[17]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Greg Mori,et al.  Relational Graph Learning for Crowd Navigation , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[20]  Mingmin Chi,et al.  Relation Parsing Neural Network for Human-Object Interaction Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ling Shao,et al.  Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zhou Yu,et al.  Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[24]  Sujitha Martin,et al.  Goal-oriented Object Importance Estimation in On-road Driving Videos , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[25]  Yannis Avrithis,et al.  Label Propagation for Deep Semi-Supervised Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wei-Shi Zheng,et al.  Learning to Learn Relation for Important People Detection in Still Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yi-Ting Chen,et al.  The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Leslie Pack Kaelbling,et al.  Neural Relational Inference with Fast Modular Meta-learning , 2019, NeurIPS.

[30]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[31]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[32]  Mohan Manubhai Trivedi,et al.  Dynamics of Driver's Gaze: Explorations in Behavior Modeling and Maneuver Prediction , 2018, IEEE Transactions on Intelligent Vehicles.

[33]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[37]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[38]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[42]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[43]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[44]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.