Vision based 3D Object Detection using Deep Learning: Methods with Challenges and Applications towards Future Directions

—For autonomous intelligent systems, 3D object detection can act as a basis for decision making by providing information such as object’s size, position and direction to perceive information about surrounding environment. Successful application using robust 3D object detection can hugely impact robotic industry, augmented and virtual reality sectors in the context of Fourth Industrial Revolution (IR4.0). Recently, deep learning has become potential approach for 3D object detection to learn powerful semantic object features for various tasks, i.e., depth map construction, segmentation and classification. As a result, exponential development in the growth of potential methods is observed in recent years. Although, good number of potential efforts have been made to address 3D object detection, a depth and critical review from different viewpoints is still lacking. As a result, comparison among various methods remains challenging which is important to select method for particular application. Based on strong heterogeneity in previous methods, this research aims to alleviate, analyze and systematize related existing research based on challenges and methodologies from different viewpoints to guide future development and evaluation by bridging the gaps using various sensors, i.e., cameras, LiDAR and Pseudo-LiDAR. At first, this research illustrates critical analysis on existing sophisticated methods by identifying six significant key areas based on current scenarios, challenges, and significant problems to be addressed for solution. Next, this research presents strict comprehensive analysis for validating 3D object detection methods based on eight authoritative 3D detection benchmark datasets depending on the size of the datasets and eight validation matrices. Finally, valuable insights of existing challenges are presented for future directions. Overall extensive review proposed in this research can contribute significantly to embark further investigation in multimodal 3D object detection.

[1]  Xiaodan Liang,et al.  Point-Guided Contrastive Learning for Monocular 3-D Object Detection , 2021, IEEE Transactions on Cybernetics.

[2]  Hongsheng Li,et al.  ST3D++: Denoised Self-Training for Unsupervised Domain Adaptation on 3D Object Detection , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  H. Radha,et al.  Fast-CLOCs: Fast Camera-LiDAR Object Candidates Fusion for 3D Object Detection , 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[4]  Tianfu Wu,et al.  Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection , 2021, AAAI.

[5]  Abhinav Sagar,et al.  AA3DNet: Attention Augmented Real Time 3D Object Detection , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW).

[6]  Anton Konushin,et al.  ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[7]  Yujie Li,et al.  Improved Point-Voxel Region Convolutional Neural Network: 3D Object Detectors for Autonomous Driving , 2021, IEEE Transactions on Intelligent Transportation Systems.

[8]  H. Bao,et al.  Shape Prior Guided Instance Disparity Estimation for 3D Object Detection , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dinesh Manocha,et al.  M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[10]  Xirong Li,et al.  BADet: Boundary-Aware 3D Object Detection from Point Clouds , 2021, Pattern Recognit..

[11]  Zhenqiang Mi,et al.  Stereo CenterNet based 3D Object Detection for Autonomous Driving , 2021, Neurocomputing.

[12]  Anan Liu,et al.  Monocular Image-Based 3-D Model Retrieval: A Benchmark , 2021, IEEE Transactions on Cybernetics.

[13]  Jianping Shi,et al.  PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection , 2021, International Journal of Computer Vision.

[14]  Andrea Simonelli,et al.  Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Baris Akgün,et al.  Frustum Fusion: Pseudo-LiDAR and LiDAR Fusion for 3D Detection , 2021, ArXiv.

[16]  Li-Chen Fu,et al.  VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection , 2021, ArXiv.

[17]  Xiaoqing Ye,et al.  The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  C. Laugier,et al.  Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection using RGB Camera and LiDAR , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[19]  Jianjun Lei,et al.  Depth-Assisted Joint Detection Network For Monocular 3d Object Detection , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[20]  Haslina Arshad,et al.  Vision-Based Efficient Collision Avoidance Model Using Distance Measurement , 2021, Soft Computing Approach for Mathematical Modeling of Engineering Problems.

[21]  Dingfu Zhou,et al.  AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Bing Deng,et al.  Improving 3D Object Detection with Channel-wise Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Li Zhang,et al.  Progressive Coordinate Transforms for Monocular 3D Object Detection , 2021, NeurIPS.

[24]  Dan Xu,et al.  Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection , 2021, ArXiv.

[25]  Qi Chu,et al.  Geometry Uncertainty Projection Network for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Hongzi Zhu,et al.  Monocular 3D Object Detection: An Extrinsic Parameter Free Approach , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yang Wang,et al.  PVGNet: A Bottom-Up One-Stage 3D Object Detector with Integrated Multi-Level Features , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Tianyuan Jiang,et al.  VIC-Net: Voxelization Information Compensation Network for Point Cloud 3D Object Detection , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Senbo Yan,et al.  OCM3D: Object-Centric Monocular 3D Object Detection , 2021, ArXiv.

[30]  Jiwen Lu,et al.  Objects are Different: Flexible Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bumsub Ham,et al.  HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Garrick Brazil,et al.  GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Haojie Li,et al.  Delving into Localization Errors for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaojuan Qi,et al.  ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Louahdi Khoudour,et al.  Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimation and 3D Object Detection , 2021, 11th International Conference of Pattern Recognition Systems (ICPRS 2021).

[36]  Steven L. Waslander,et al.  Categorical Depth Distribution Network for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ming Liu,et al.  Ground-Aware Monocular 3D Object Detection for Autonomous Driving , 2021, IEEE Robotics and Automation Letters.

[38]  Yan Wang,et al.  PLUMENet: Efficient 3D Object Detection from Stereo Images , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Gao Huang,et al.  3D Object Detection with Pointformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Leonidas J. Guibas,et al.  3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Antonios Tsourdos,et al.  RoIFusion: 3D Object Detection From LiDAR and Vision , 2020, IEEE Access.

[42]  Xiaogang Wang,et al.  From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Z. R. Mahayuddin,et al.  Vision based 3D Gesture Tracking using Augmented Reality and Virtual Reality for Improved Learning Applications , 2021, International Journal of Advanced Computer Science and Applications.

[44]  Andrey S. Krylov,et al.  P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds , 2021, IEEE Access.

[45]  Weilong Yao,et al.  Geometry-aware data augmentation for monocular 3D object detection , 2021, ArXiv.

[46]  Z. R. Mahayuddin,et al.  Edge Feature based Moving Object Detection Using Aerial Images: A Comparative Study , 2020, 2020 6th International Conference on Computing Engineering and Design (ICCED).

[47]  Bernt Schiele,et al.  Kinematic 3D Object Detection in Monocular Video , 2020, ECCV.

[48]  Yan Wang,et al.  End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Steven L. Waslander,et al.  Confidence Guided Stereo 3D Object Detection with Split Depth Estimation , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50]  Jiaya Jia,et al.  DSGN: Deep Stereo Geometry Network for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Huaici Zhao,et al.  RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving , 2020, ECCV.

[52]  Xiaogang Wang,et al.  PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Yan Wang,et al.  Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving , 2019, ICLR.

[55]  A. Saif,et al.  Moment Features based Violence Action Detection using Optical Flow , 2020 .

[56]  Zainal Rasyid Mahayuddin,et al.  Robust Drowsiness Detection for Vehicle Driver using Deep Convolutional Neural Network , 2020 .

[57]  Shubhra Aich,et al.  RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving , 2019, ArXiv.

[58]  Jiaya Jia,et al.  Fast Point R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Alexandre Alahi,et al.  MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Andrea Simonelli,et al.  Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[62]  Shaojie Shen,et al.  Stereo R-CNN Based 3D Object Detection for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Roberto Cipolla,et al.  Orthographic Feature Transform for Monocular 3D Object Detection , 2018, BMVC.

[68]  Shinpei Kato,et al.  LMNet: Real-time Multiclass Object Detection on CPU Using 3D LiDAR , 2018, 2018 3rd Asia-Pacific Conference on Intelligent Robot Systems (ACIRS).

[69]  Fernando García,et al.  BirdNet: A 3D Object Detection Framework from LiDAR Information , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[70]  Sascha Wirges,et al.  Object Detection and Classification in Occupancy Grid Maps Using Deep Convolutional Networks , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[71]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[72]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Marcelo H. Ang,et al.  A General Pipeline for 3D Detection of Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[74]  Lorenzo Porzi,et al.  In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[76]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[79]  Sanja Fidler,et al.  3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Huimin Ma,et al.  Boundary-aware box refinement for object proposal generation , 2017, Neurocomputing.

[81]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[84]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[88]  Huimin Ma,et al.  Learning a compact latent representation of the Bag-of-Parts model , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[89]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.