PP-RCNN: Point-Pillars Feature Set Abstraction for 3D Real-time Object Detection

3D object detection in point cloud data is an important aspect of computer vision systems, especially for autonomous driving applications. Recent literature suggests two methods of point cloud encoders; grid-based methods tend to be fast but sacrifice accuracy, while point-based methods that are learned from raw data are more accurate, but slower. In this work, we present a novel and real-time two-stage 3D object detection framework, named PointPillars-RCNN (PP-RCNN). In the first stage, we use pillars network to encode the point cloud and generate high-qulaity 3D proposals. Benefiting from the pillars network, our framework realizes real-time detection. In the second stage, we use the Point-Pillars Feature Set Abstraction (PPSA) module to extract the point-based features from raw point cloud and pillars features, and then we use the RoI-grid feature abstraction for proposals refinement. All our detection pipelines are trained end-to-end. Extensive experiments on the KITTI benchmark shows that our approach has better performance than the one-stage PointPillars algorithm, and faster than current two-stage state-of-the-art algorithms.