论文信息 - Creating 3D Bounding Box Hypotheses From Deep Network Score-Maps

Creating 3D Bounding Box Hypotheses From Deep Network Score-Maps

There are two common paradigms for indoor scene understanding, pixel-level labeling and bounding box generation. The two tasks have a complementary nature but are normally achieved separately with different computational flows. We propose a novel method to bridge the two tasks by creating category-specific 3D bounding box hypotheses from score-maps of any deep networks trained on pixel-level semantic labels along with depth data. Those hypotheses can be further used to locate all objects as different non-overlapping bounding boxes by incorporating high-level knowledge, such as common room settings, co-existence or co-exclusiveness etc. We develop an objective function that involves confidence scores and the depth visibility to initialize and optimize multiple hypotheses for each category-specific score map. Experiment results show that our method significantly outperforms direct bounding box generation using pixel-level labeling.

[1] V. Torczon,et al. Direct search methods: then and now , 2000 .

[2] Guoliang Fan,et al. Robust object detection by cuboid matching with local plane optimization in indoor RGB-D images , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[3] Silvio Savarese,et al. Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Thorsten Joachims,et al. Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[7] Ling Shao,et al. 3D object detection: Learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images , 2019, Inf. Sci..

[8] Jana Kosecka,et al. 3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Toby Sharp,et al. Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10] Jianxiong Xiao,et al. Reconstructing the World’s Museums , 2012, International Journal of Computer Vision.

[11] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Danfei Xu,et al. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[18] Jianxiong Xiao,et al. A Linear Approach to Matching Cuboids in RGBD Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Ricardo Cabral,et al. Piecewise Planar and Compact Floorplan Reconstruction from Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Zhen Li,et al. LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling , 2016, ECCV.