Pose-RCNN: Joint object detection and pose estimation using 3D object proposals

This paper presents a novel approach for joint object detection and orientation estimation in a single deep convolutional neural network utilizing proposals calculated from 3D data. For orientation estimation, we extend a R-CNN like architecture by several carefully designed layers. Two new object proposal methods are introduced, to make use of stereo as well as lidar data. Our experiments on the KITTI dataset show that by combining proposals of both domains, high recall can be achieved while keeping the number of proposals low. Furthermore, our method for joint detection and orientation estimation outperforms state of the art approaches for cyclists on the easy test scenario of the KITTI test dataset.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Bernt Schiele,et al.  Taking a deeper look at pedestrians , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Chengcui Zhang,et al.  A progressive morphological filter for removing nonground measurements from airborne LIDAR data , 2003, IEEE Trans. Geosci. Remote. Sens..

[5]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8]  Markus Enzweiler,et al.  Efficient Stixel-based object recognition , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[9]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Radu Bogdan Rusu,et al.  Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments , 2010, KI - Künstliche Intelligenz.

[12]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Dariu Gavrila,et al.  Context-Based Pedestrian Path Prediction , 2014, ECCV.

[14]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Peter V. Gehler,et al.  Multi-View and 3D Deformable Part Models , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[19]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Uwe Franke,et al.  Towards a Global Optimal Multi-Layer Stixel Representation of Dense 3D Data , 2011, BMVC.

[25]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Hui Xiong,et al.  A new benchmark for vision-based cyclist detection , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[27]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[28]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[29]  Dariu Gavrila,et al.  A Probabilistic Framework for Joint Pedestrian Head and Body Orientation Estimation , 2015, IEEE Transactions on Intelligent Transportation Systems.

[30]  Lucas Beyer,et al.  Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels , 2015, GCPR.

[31]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.