Three-Dimensional Target Detection Based on RGB-D Data

Received: 15 September 2020 Accepted: 25 January 2021 The current three-dimensional (3D) target detection model has a low accuracy, because the surface information of the target can only be partially represented by its two-dimensional (2D) image detector. To solve the problem, this paper studies the 3D target detection in the RGB-D data of indoor scenes, and modifies the frustum PointNet (F-PointNet), a model superior in point cloud data processing, to detect indoor targets like sofa, chair, and bed. The 2D image detector of F-PointNet was replaced with you only look once (YOLO) v3 and faster region-based convolutional neural network (R-CNN) respectively. Then, the FPointNet models with the two 2D image detectors were compared on SUN RGB-D dataset. The results show that the model with YOLO v3 did better in target detection, with a clear advantage in mean average precision (>6.27).

[1]  Ze Liu,et al.  Saliency-Based Pedestrian Detection in Far Infrared Images , 2017, IEEE Access.

[2]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Erik B. Sudderth,et al.  Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kang Ryoung Park,et al.  Pedestrian Detection Based on Adaptive Selection of Visible Light or Far-Infrared Light Camera Image by Fuzzy Inference System and Convolutional Neural Network-Based Verification , 2017, Sensors.

[6]  Jinwen Ma,et al.  Combination features and models for human detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[8]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Bernard Ghanem,et al.  2D-Driven 3D Object Detection in RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Jason J. Corso,et al.  A Continuous Occlusion Model for Road Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Dariu Gavrila,et al.  A Multilevel Mixture-of-Experts Framework for Pedestrian Classification , 2011, IEEE Transactions on Image Processing.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[18]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[22]  Bernt Schiele,et al.  Detection and Tracking of Occluded People , 2014, International Journal of Computer Vision.

[23]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  David Vázquez,et al.  On-Board Object Detection: Multicue, Multimodal, and Multiview Random Forest of Local Experts , 2017, IEEE Transactions on Cybernetics.

[25]  Hyung Jin Chang,et al.  Detection of Moving Objects with Non-stationary Cameras in 5.8ms: Bringing Motion Detection to Your Mobile Device , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.