OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

Semantic occupancy perception is essential for autonomous driving, as automated vehicles require a fine-grained perception of the 3D urban structures. However, existing relevant benchmarks lack diversity in urban scenes, and they only evaluate front-view predictions. Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark. In the OpenOccupancy benchmark, we extend the large-scale nuScenes dataset with dense semantic occupancy annotations. Previous annotations rely on LiDAR points superimposition, where some occupancy labels are missed due to sparse LiDAR channels. To mitigate the problem, we introduce the Augmenting And Purifying (AAP) pipeline to ~2x densify the annotations, where ~4000 human hours are involved in the labeling process. Besides, camera-based, LiDAR-based and multi-modal baselines are established for the OpenOccupancy benchmark. Furthermore, considering the complexity of surrounding occupancy perception lies in the computational burden of high-resolution 3D predictions, we propose the Cascade Occupancy Network (CONet) to refine the coarse prediction, which relatively enhances the performance by ~30% than the baseline. We hope the OpenOccupancy benchmark will boost the development of surrounding occupancy perception algorithms.

[1]  Jiwen Lu,et al.  Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zeming Li,et al.  BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection , 2022, AAAI.

[3]  Huizi Mao,et al.  BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Raoul de Charette,et al.  MonoScene: Monocular 3D Semantic Scene Completion , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rohit Mohan,et al.  Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking , 2021, IEEE Robotics and Automation Letters.

[6]  Anne Verroust-Blondet,et al.  3D Semantic Scene Completion: A Survey , 2021, International Journal of Computer Vision.

[7]  Dariu M. Gavrila,et al.  Semantic Scene Completion Using Local Deep Implicit Functions on LiDAR Data , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Dalong Du,et al.  BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View , 2021, ArXiv.

[9]  Oliver Scheel,et al.  Urban Driver: Learning to Drive from Real-world Demonstrations Using Policy Gradients , 2021, CoRL.

[10]  Xiaogang Wang,et al.  Semantic Scene Completion via Integrating Instances and Scene in-the-Loop , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shuguang Cui,et al.  Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion , 2020, AAAI.

[12]  Adrian Hilton,et al.  EdgeNet: Semantic Scene Completion from a Single RGB- D Image , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[13]  Federico Tombari,et al.  SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion , 2020, 2020 International Conference on 3D Vision (3DV).

[14]  Anne Verroust-Blondet,et al.  LMSCNet: Lightweight Multiscale 3D Semantic Completion , 2020, 2020 International Conference on 3D Vision (3DV).

[15]  Jie Li,et al.  Anisotropic Convolutional Networks for 3D Semantic Scene Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaokang Chen,et al.  3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yue Gao,et al.  Attention-based Multi-modal Fusion Network for Semantic Scene Completion , 2020, AAAI.

[18]  Biao Gao,et al.  SemanticPOSS: A Point Cloud Dataset with Large Quantity of Dynamic Instances , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[19]  Roland Siegwart,et al.  Depth Based Semantic Scene Completion With Position Importance Aware Loss , 2020, IEEE Robotics and Automation Letters.

[20]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Gang Zeng,et al.  Semantic Point Completion Network for 3D Semantic Scene Completion , 2020, ECAI.

[24]  Federico Tombari,et al.  ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Huchuan Lu,et al.  Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  David J. Griffiths,et al.  SynthCity: A large scale synthetic point cloud , 2019, ArXiv.

[27]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Yu Liu,et al.  RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Juergen Gall,et al.  Two Stream 3D Semantic Scene Completion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[33]  Hongen Liao,et al.  Efficient Semantic Scene Completion Network with Spatial Group Convolution , 2018, ECCV.

[34]  Steven Lake Waslander,et al.  In Defense of Classical Image Processing: Fast Depth Completion on the CPU , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[35]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Yu Hu,et al.  See and Think: Disentangling Semantic Scene Completion , 2018, NeurIPS.

[38]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Duc Thanh Nguyen,et al.  SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[42]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[48]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.