SemanticFusion: Joint Labeling, Tracking and Mapping

Kick-started by deployment of the well-known KinectFusion, recent research on the task of RGBD-based dense volume reconstruction has focused on improving different shortcomings of the original algorithm. In this paper we tackle two of them: drift in the camera trajectory caused by the accumulation of small per-frame tracking errors and lack of semantic information within the output of the algorithm. Accordingly, we present an extended KinectFusion pipeline which takes into account per-pixel semantic labels gathered from the input frames. By such clues, we extend the memory structure holding the reconstructed environment so to store per-voxel information on the kinds of object likely to appear in each spatial location. We then take such information into account during the camera localization step to increase the accuracy in the estimated camera trajectory. Thus, we realize a SemanticFusion loop whereby per-frame labels help better track the camera and successful tracking enables to consolidate instantaneous semantic observations into a coherent volumetric map.

[1]  Vladlen Koltun,et al.  Depth camera tracking with contour cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Marsette Vona,et al.  Moving Volume KinectFusion , 2012, BMVC.

[3]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Daniel Cremers,et al.  Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions , 2013, Robotics: Science and Systems.

[5]  Andrew W. Fitzgibbon,et al.  3D scanning deformable objects with a single RGBD sensor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Carl Olsson,et al.  Robust Camera Tracking by Combining Color and Depth Measurements , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew W. Fitzgibbon,et al.  Large-scale and drift-free surface reconstruction using online subvolume registration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luigi di Stefano,et al.  Volume-Based Semantic Labeling with Signed Distance Functions , 2015, PSIVT.

[10]  Matthias Nießner,et al.  SemanticPaint , 2015, ACM Trans. Graph..

[11]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[12]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[13]  Vladlen Koltun,et al.  Elastic Fragments for Dense Scene Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[15]  Achim J. Lilienthal,et al.  SDF Tracker: A parallel algorithm for on-line pose estimation and scene reconstruction from depth images , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[18]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Vladlen Koltun,et al.  Dense scene reconstruction with points of interest , 2013, ACM Trans. Graph..

[21]  Luigi di Stefano,et al.  Towards Semantic KinectFusion , 2013, ICIAP.

[22]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Luigi di Stefano,et al.  Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Dieter Fox,et al.  Patch Volumes: Segmentation-Based Consistent Mapping with RGB-D Cameras , 2013, 2013 International Conference on 3D Vision.

[25]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[26]  Matthias Nießner,et al.  SemanticPaint: interactive segmentation and learning of 3D world , 2015, SIGGRAPH Talks.

[27]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[28]  Ming C. Lin,et al.  Example-guided physically based modal sound synthesis , 2013, ACM Trans. Graph..