Effective 3D object detection based on detector and tracker

In recent years, the understanding the semantics of 3D scenes has been a wide interesting researching point in many application. However, 3D scenes detection remains many problems, due to the difficulty in acquiring sufficient 3D model towards training effective classifiers. In order to address these problems, in this paper, we first publish a new real-world 3D model dataset MV-RED, which includes 505 objects recoded by Kinect camera. Then we propose a novel 3D object detection approach in real-world scenes combined RGB image based on MV-RED dataset. In order to improve the detection precision, we also utilize the tracking method to improve the detection results. Finally, we evaluate our approach on the RGB-D dataset which is provided by Lai et al. (2012) [20], achieving much greater efficiency and comparable accuracy. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-of-the-art approaches with far better efficiency.

[1]  Xiangyu Wang,et al.  3D Model Retrieval with Weighted Locality-constrained Group Sparse Coding , 2015, Neurocomputing.

[2]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[5]  Bo Fu,et al.  Automatic posing of a meshed human model using point clouds , 2015, Comput. Graph..

[6]  Qing Wang,et al.  2D/3D rotation-invariant detection using equivariant filters and kernel weighted mapping , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Abdul Nurunnabi,et al.  Outlier detection and robust normal-curvature estimation in mobile laser scanning 3D point cloud data , 2015, Pattern Recognit..

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Wilmot Li,et al.  Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..

[13]  Rongrong Ji,et al.  Label Propagation from ImageNet to 3D Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Qi Tian,et al.  Less is More: Efficient 3-D Object Retrieval With Query View Selection , 2011, IEEE Transactions on Multimedia.

[15]  Zhen Wang,et al.  A Multiscale and Hierarchical Feature Extraction Method for Terrestrial Laser Scanning Point Cloud Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Yang Wang,et al.  Feature fusion for vehicle detection and tracking with low-angle cameras , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[17]  Liujuan Cao,et al.  Single/cross-camera multiple-person tracking by graph matching , 2014, Neurocomputing.

[18]  Yue Gao,et al.  3D model comparison using spatial structure circular descriptor , 2010, Pattern Recognit..

[19]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yaser Sheikh,et al.  3D Pose-by-Detection of Vehicles via Discriminatively Reduced Ensembles of Correlation Filters , 2014, BMVC.

[21]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[22]  Yue Gao,et al.  Camera Constraint-Free View-Based 3-D Object Retrieval , 2012, IEEE Transactions on Image Processing.

[23]  Wolfram Burgard,et al.  Instace-Based AMN Classification for Improved Object Recognition in 2D and 3D Laser Range Data , 2007, IJCAI.

[24]  Niloy J. Mitra,et al.  Coupled structure-from-motion and 3D symmetry detection for urban facades , 2014, ACM Trans. Graph..

[25]  Mohan S. Kankanhalli,et al.  Multi-view action recognition by cross-domain learning , 2014, 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP).

[26]  Ian Williams,et al.  A Statistical Method for Improved 3D Surface Detection , 2015, IEEE Signal Processing Letters.

[27]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[28]  Anan Liu,et al.  Multiple Person Tracking by Spatiotemporal Tracklet Association , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[29]  Shuicheng Yan,et al.  Robust Graph Mode Seeking by Graph Shift , 2010, ICML.

[30]  Anni Cai,et al.  Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset , 2012, Multimedia Tools and Applications.

[31]  Xindong Wu,et al.  3-D Object Retrieval With Hausdorff Distance Learning , 2014, IEEE Transactions on Industrial Electronics.

[32]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[33]  Yuting Su,et al.  Graph-based characteristic view set extraction and matching for 3D model retrieval , 2015, Inf. Sci..

[34]  H. Zhang,et al.  Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition , 2015, Neurocomputing.

[35]  Tae-Kyun Kim,et al.  Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Marc Pollefeys,et al.  City-Scale Change Detection in Cadastral 3D Models Using Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[38]  Zan Gao,et al.  Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition , 2015, Signal Process..

[39]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.