Just Label What You Need: Fine-Grained Active Selection for Perception and Prediction through Partially Labeled Scenes

Self-driving vehicles must perceive and predict the future positions of nearby actors in order to avoid collisions and drive safely. A learned deep learning module is often responsible for this task, requiring large-scale, highquality training datasets. As data collection is often significantly cheaper than labeling in this domain, the decision of which subset of examples to label can have a profound impact on model performance. Active learning techniques, which leverage the state of the current model to iteratively select examples for labeling, offer a promising solution to this problem. However, despite the appeal of this approach, there has been little scientific analysis of active learning approaches for the perception and prediction (P&P) problem. In this work, we study active learning techniques for P&P and find that the traditional active learning formulation is ill-suited for the P&P setting. We thus introduce generalizations that ensure that our approach is both cost-aware and allows for fine-grained selection of examples through partially labeled scenes. Our experiments on a real-world, large-scale self-driving dataset suggest that fine-grained selection can improve the performance across perception, prediction, and downstream planning tasks.

[1]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  C. V. Jawahar,et al.  Region-based active learning for efficient labeling in semantic segmentation , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[4]  Ersin Yumer,et al.  Diverse Complexity Measures for Dataset Curation in Self-Driving , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[7]  Ersin Yumer,et al.  Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints , 2019, ICLR.

[8]  Ersin Yumer,et al.  Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs , 2020, CoRL.

[9]  Sotaro Tsukizawa,et al.  Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Carsten Rother,et al.  CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation , 2018, BMVC.

[13]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hugh F. Durrant-Whyte,et al.  On entropy approximation for Gaussian mixture random vectors , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[15]  Michele Fenzi,et al.  Scalable Active Learning for Object Detection , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[16]  Jenq-Neng Hwang,et al.  Uncertainty-Based Active Learning via Sparse Modeling for Image Classification , 2019, IEEE Transactions on Image Processing.

[17]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Elena Corina Grigore,et al.  CoverNet: Multimodal Behavior Prediction Using Trajectory Sets , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[20]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[21]  Renjie Liao,et al.  Discrete Residual Flow for Probabilistic Pedestrian Behavior Prediction , 2019, CoRL.

[22]  Vinay P. Namboodiri,et al.  Deep active learning for object detection , 2018, BMVC.

[23]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Yuhong Guo,et al.  Active Instance Sampling via Matrix Partition , 2010, NIPS.

[25]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[26]  Ruslan Salakhutdinov,et al.  Multiple Futures Prediction , 2019, NeurIPS.

[27]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[28]  Himanshu Arora,et al.  Contextual Diversity for Active Learning , 2020, ECCV.

[29]  Raquel Urtasun,et al.  Latent Structured Active Learning , 2013, NIPS.

[30]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Geunseob Oh,et al.  HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Klaus C. J. Dietmayer,et al.  Deep Active Learning for Efficient Training of a LiDAR 3D Object Detector , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[33]  Sergio Casas,et al.  Implicit Latent Variable Model for Scene-Consistent Motion Forecasting , 2020, ECCV.

[34]  Mohan M. Trivedi,et al.  Active learning for on-road vehicle detection: a comparative study , 2014, Machine Vision and Applications.

[35]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sergio Casas,et al.  IntentNet: Learning to Predict Intention from Raw Sensor Data , 2018, CoRL.