Towards 3D object detection with bimodal deep Boltzmann machines over RGBD imagery

Nowadays, detecting objects in 3D scenes like point clouds has become an emerging challenge with various applications. However, it retains as an open problem due to the deficiency of labeling 3D training data. To deploy an accurate detection algorithm typically resorts to investigating both RGB and depth modalities, which have distinct statistics while correlated with each other. Previous research mainly focus on detecting objects using only one modality, which ignores exploiting the cross-modality cues. In this work, we propose a cross-modality deep learning framework based on deep Boltzmann Machines for 3D Scenes object detection. In particular, we demonstrate that by learning cross-modality feature from RGBD data, it is possible to capture their joint information to reinforce detector trainings in individual modalities. In particular, we slide a 3D detection window in the 3D point cloud to match the exemplar shape, which the lack of training data in 3D domain is conquered via (1) We collect 3D CAD models and 2D positive samples from Internet. (2) adopt pretrained R-CNNs [2] to extract raw feature from both RGB and Depth domains. Experiments on RMRC dataset demonstrate that the bimodal based deep feature learning framework helps 3D scene object detection.

[1]  Lars Petersson,et al.  Non-associative Higher-Order Markov Networks for Point Cloud Classification , 2014, ECCV.

[2]  Martial Hebert,et al.  3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Abhinav Gupta,et al.  Building Part-Based Object Detectors via 3D Geometry , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[5]  Thorsten Joachims,et al.  Contextually guided semantic labeling and search for three-dimensional point clouds , 2013, Int. J. Robotics Res..

[6]  Rongrong Ji,et al.  Label Propagation from ImageNet to 3D Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[10]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[11]  S. Song,et al.  Sliding Shapes for 3 D Object Detection in RGB-D Images , 2014 .

[12]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[13]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[14]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Derek Hoiem,et al.  Support Surface Prediction in Indoor Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[17]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[18]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[19]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[20]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[21]  Klaus D. Tönnies,et al.  Detection of Clustered Objects in Sparse Point Clouds Through 2D Classification and Quadric Filtering , 2014, GCPR.

[22]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[23]  Olaf Kähler,et al.  Efficient 3D Scene Labeling Using Fields of Trees , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Gang Wang,et al.  Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling , 2014, ECCV.

[25]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.