Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving

3D object detection has recently received much attention due to its great potential in autonomous vehicle (AV). The success of deep learning based object detectors relies on the availability of large-scale annotated datasets, which is time-consuming and expensive to compile, especially for 3D bounding box annotation. In this work, we investigate diversity-based active learning (AL) as a potential solution to alleviate the annotation burden. Given limited annotation budget, only the most informative frames and objects are automatically selected for human to annotate. Technically, we take the advantage of the multimodal information provided in an AV dataset, and propose a novel acquisition function that enforces spatial and temporal diversity in the selected samples. We benchmark the proposed method against other AL strategies under realistic annotation cost measurement, where the realistic costs for annotating a frame and a 3D bounding box are both taken into consideration. We demonstrate the effectiveness of the proposed method on the nuScenes dataset and show that it outperforms existing AL strategies significantly.

[1]  Zhihui Li,et al.  A Survey of Deep Active Learning , 2020, ACM Comput. Surv..

[2]  Lining Zhang,et al.  Exploring Spatial Diversity for Region-Based Active Learning , 2021, IEEE Transactions on Image Processing.

[3]  Chuan-Sheng Foo,et al.  Revisiting Superpixels for Active Learning in Semantic Segmentation with Realistic Annotation Costs , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiangyang Ji,et al.  Multiple Instance Active Learning for Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Philipp Krähenbühl,et al.  Center-based 3D Object Detection and Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hsuan-Tien Lin,et al.  Cold-start Active Learning through Self-Supervised Language Modeling , 2020, EMNLP.

[7]  Vineeth N. Balasubramanian,et al.  Towards Fine-grained Sampling for Active Learning in Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Yanan Sun,et al.  3DSSD: Point-Based 3D Single Stage Object Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  A. Yuille,et al.  Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots , 2019, ECCV.

[10]  Alex H. Lang,et al.  PointPainting: Sequential Fusion for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Larry S. Davis,et al.  Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Cost , 2019, ECCV.

[12]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Alan L. Yuille,et al.  Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization , 2020, NeurIPS.

[14]  Joost van de Weijer,et al.  Active Learning for Deep Detection Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Benjin Zhu,et al.  Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection , 2019, ArXiv.

[16]  Sheng-Jun Huang,et al.  Self-Paced Active Learning: Query the Right Thing at the Right Time , 2019, AAAI.

[17]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Klaus C. J. Dietmayer,et al.  Deep Active Learning for Efficient Training of a LiDAR 3D Object Detector , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[22]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Joachim Denzler,et al.  Active Learning for Deep Object Detection , 2018, VISIGRAPP.

[25]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[26]  Carsten Rother,et al.  CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation , 2018, BMVC.

[27]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[28]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Ming-Yu Liu,et al.  Localization-Aware Active Learning for Object Detection , 2018, ACCV.

[30]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[33]  Lei Zhang,et al.  Active Self-Paced Learning for Cost-Effective and Progressive Face Identification , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Vinay P. Namboodiri,et al.  Deep active learning for object detection , 2018, BMVC.

[35]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[36]  Yarin Gal,et al.  Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[37]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[38]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[41]  Amit K. Roy-Chowdhury,et al.  Context Aware Active Learning of Activity Recognition Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[43]  Yi Yang,et al.  Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization , 2015, International Journal of Computer Vision.

[44]  Allen Y. Yang,et al.  A Convex Optimization Framework for Active Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[46]  Yuhong Guo,et al.  Active Instance Sampling via Matrix Partition , 2010, NIPS.

[47]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Dan Roth,et al.  Margin-Based Active Learning for Structured Output Spaces , 2006, ECML.