Dense Object Reconstruction from RGBD Images with Embedded Deep Shape Representations

Most problems involving simultaneous localization and mapping can nowadays be solved using one of two fundamentally different approaches. The traditional approach is given by a least-squares objective, which minimizes many local photometric or geometric residuals over explicitly parametrized structure and camera parameters. Unmodeled effects violating the lambertian surface assumption or geometric invariances of individual residuals are encountered through statistical averaging or the addition of robust kernels and smoothness terms. Aiming at more accurate measurement models and the inclusion of higher-order shape priors, the community more recently shifted its attention to deep end-to-end models for solving geometric localization and mapping problems. However, at test-time, these feed-forward models ignore the more traditional geometric or photometric consistency terms, thus leading to a low ability to recover fine details and potentially complete failure in corner case scenarios. With an application to dense object modeling from RGBD images, our work aims at taking the best of both worlds by embedding modern higher-order object shape priors into classical iterative residual minimization objectives. We demonstrate a general ability to improve mapping accuracy with respect to each modality alone, and present a successful application to real data.

[1]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[2]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Rozenn Dahyot,et al.  Deep Shape from a Low Number of Silhouettes , 2016, ECCV Workshops.

[4]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[5]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..

[6]  Horst Wildenauer,et al.  Descriptor free visual indoor localization with line segments , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[8]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jonathan P. How,et al.  SLAM with objects using a nonparametric pose graph , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  David Meger,et al.  Improved Adversarial Systems for 3D Object Generation and Reconstruction , 2017, CoRL.

[11]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[14]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[15]  Paul H. J. Kelly,et al.  Dense planar SLAM , 2014, ISMAR.

[16]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[21]  Vladlen Koltun,et al.  A Large Dataset of Object Scans , 2016, ArXiv.

[22]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[23]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[24]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[25]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[26]  Bo Yang,et al.  3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27]  Simon Lucey,et al.  Semantic Photometric Bundle Adjustment on Natural Sequences , 2017, ArXiv.

[28]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[29]  Marc Pollefeys,et al.  Dense Semantic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.