Learning a Multi-View Stereo Machine

We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

[1]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[2]  Tomaso Poggio,et al.  Cooperative computation of stereo disparity , 1988 .

[3]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[4]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[6]  Robert T. Collins,et al.  A space-sweep approach to true multi-image matching , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Takeo Kanade,et al.  Development of a Video-Rate Stereo Machine , 1997 .

[9]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[10]  Ruigang Yang,et al.  Real‐Time Consensus‐Based Scene Reconstruction Using Commodity Graphics Hardware † , 2003, Comput. Graph. Forum.

[11]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[12]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[13]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[14]  Roberto Cipolla,et al.  Multi-view stereo via volumetric graph-cuts , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Jan-Michael Frahm,et al.  Real-Time Visibility-Based Fusion of Depth Maps , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Pau Gargallo,et al.  Minimizing the Reprojection Error in Surface Reconstruction from Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Joseph L. Mundy,et al.  Change Detection in a 3-d World , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ashutosh Saxena,et al.  Depth Estimation Using Monocular and Stereo Cues , 2007, IJCAI.

[21]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[22]  Victor S. Lempitsky,et al.  Global Optimization for Shape Fitting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jean-Philippe Pons,et al.  Efficient Multi-View Reconstruction of Large-Scale Scenes using Interest Points, Delaunay Triangulation and Graph Cuts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Marc Pollefeys,et al.  Multi-View Stereo via Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  J. Ponce,et al.  Accurate, Dense, and Robust Multi-View Stereopsis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Daniel Cremers,et al.  Multiview Stereo and Silhouette Consistency via Convex Functionals over Convex Domains , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Silvio Savarese,et al.  Dense Object Reconstruction with Semantic Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Marc Pollefeys,et al.  Real-Time Direct Dense Matching on Fisheye Images Using Plane-Sweeping Stereo , 2014, 2014 2nd International Conference on 3D Vision.

[36]  Marc Pollefeys,et al.  Class Specific 3D Object Shape Priors Using Surface Normals , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  D. Cooper,et al.  Statistical Inverse Ray Tracing for Image-Based 3D Modeling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Michael J. Black,et al.  Towards Probabilistic Volumetric Reconstruction Using Ray Potentials , 2015, 2015 International Conference on 3D Vision.

[40]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[41]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[43]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[44]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[45]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[47]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[48]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[50]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[51]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[52]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[53]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[54]  Marc Pollefeys,et al.  Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Marc Pollefeys,et al.  Dense Semantic 3D Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Jitendra Malik,et al.  Learning Category-Specific Deformable 3D Models for Object Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[60]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[62]  Luc Van Gool,et al.  Learned Multi-patch Similarity , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Alexei A. Efros,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).