Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras. We pair traditional stereopsis and modern differentiable rendering to build an end-to-end model which predicts textured 3D meshes of objects with varying topologies and shape. We frame stereopsis as an optimization problem and simultaneously update shape and cameras via simple gradient descent. We run an extensive quantitative analysis and compare to traditional multi-view stereo techniques and state-of-the-art learning based methods. We show compelling reconstructions on challenging real-world scenes and for an abundance of object types with complex shape, topology and texture.11Project webpage: https://shubham-goel.github.io/ds/

[1]  Leon L. Xu,et al.  ABO: Dataset and Benchmarks for Real-World 3D Object Understanding , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shubham Tulsiani,et al.  Shelf-Supervised Mesh Prediction in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jitendra Malik,et al.  Shape and Viewpoint without Keypoints , 2020, ECCV.

[4]  Yu-Wing Tai,et al.  Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking , 2020, ECCV.

[5]  Abhinav Gupta,et al.  Implicit Mesh Reconstruction from Unannotated Image Collections , 2020, ArXiv.

[6]  Baoyuan Wu,et al.  Unsupervised Multi-View Constrained Convolutional Network for Accurate Depth Estimation , 2020, IEEE Transactions on Image Processing.

[7]  Abhinav Gupta,et al.  Articulation-Aware Canonical Surface Mapping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[9]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[10]  Jan Kautz,et al.  Self-supervised Single-view 3D Reconstruction via Semantic Consistency , 2020, ECCV.

[11]  Ross B. Girshick,et al.  PointRend: Image Segmentation As Rendering , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[14]  S. Fidler,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[15]  Jitendra Malik,et al.  Mesh R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[18]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[19]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[21]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Subhransu Maji,et al.  3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[26]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[27]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[28]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[31]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Marc Pollefeys,et al.  Photometric Bundle Adjustment for Dense Multi-view 3D Modeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[35]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[36]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[37]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  João Paulo Costeira,et al.  Estimating 3D shape from degenerate sequences with missing data , 2009, Comput. Vis. Image Underst..

[39]  Olivier D. Faugeras,et al.  Multi-View Stereo Reconstruction and Scene Flow Estimation with a Global Image-Based Matching Score , 2007, International Journal of Computer Vision.

[40]  Marc Alexa,et al.  Laplacian mesh optimization , 2006, GRAPHITE '06.

[41]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[42]  Greg Turk,et al.  Simplification and Repair of Polygonal Models Using Volumetric Techniques , 2003, IEEE Trans. Vis. Comput. Graph..

[43]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[44]  Gabriel Taubin,et al.  The ball-pivoting algorithm for surface reconstruction , 1999, IEEE Transactions on Visualization and Computer Graphics.

[45]  P. Anandan,et al.  Direct Recovery of Planar-Parallax from Multiple Frames , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Olivier D. Faugeras,et al.  Complete Dense Stereovision Using Level Set Methods , 1998, ECCV.

[47]  Thomas Ertl,et al.  Computer graphics—principles and practice , 1997 .

[48]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[49]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[51]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  C. Wheatstone XVIII. Contributions to the physiology of vision. —Part the first. On some remarkable, and hitherto unobserved, phenomena of binocular vision , 1962, Philosophical Transactions of the Royal Society of London.