Learning Priors for Semantic 3D Reconstruction

We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

[1]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Daniel Cremers,et al.  Anisotropic Minimal Surfaces Integrating Photoconsistency and Normal Information for Multiview Stereo , 2010, ECCV.

[3]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[5]  Michael Möller,et al.  Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  S. Osher,et al.  Decomposition of images by the anisotropic Rudin‐Osher‐Fatemi model , 2004 .

[7]  Daniel Cremers,et al.  Continuous Global Optimization in Multiview 3D Reconstruction , 2007, EMMCVPR.

[8]  Matthias Nießner,et al.  ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[11]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[12]  Marc Niethammer,et al.  Globally Optimal Finsler Active Contours , 2009, DAGM-Symposium.

[13]  Jan Dirk Wegner,et al.  Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Marc Pollefeys,et al.  Class Specific 3D Object Shape Priors Using Surface Normals , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Horst Bischof,et al.  Variational segmentation of elongated volumetric structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Niels Chr. Overgaard,et al.  Extending continuous cuts: Anisotropic metrics and expansion moves , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Thomas Pock,et al.  Convolutional Networks for Shape from Light Field , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Mila Nikolova,et al.  Algorithms for Finding Global Minimizers of Image Segmentation and Denoising Models , 2006, SIAM J. Appl. Math..

[20]  Thomas Pock,et al.  A Deep Variational Model for Image Segmentation , 2014, GCPR.

[21]  Thomas Pock,et al.  Variational Networks: Connecting Variational Methods and Deep Learning , 2017, GCPR.

[22]  Thomas Pock,et al.  A Primal Dual Network for Low-Level Vision Problems , 2017, GCPR.

[23]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xavier Bresson,et al.  Fast Global Minimization of the Active Contour/Snake Model , 2007, Journal of Mathematical Imaging and Vision.

[25]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[26]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Marc Pollefeys,et al.  Multi-Label Semantic 3D Reconstruction Using Voxel Blocks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[28]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Alexei A. Efros,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Zhen Li,et al.  High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[34]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Marc Pollefeys,et al.  Segment based 3D object shape priors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[37]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[39]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Marc Pollefeys,et al.  Dense Semantic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Antonin Chambolle,et al.  Diagonal preconditioning for first order primal-dual algorithms in convex optimization , 2011, 2011 International Conference on Computer Vision.

[42]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[43]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).