Pixel2Mesh: 3D Mesh Model Generation via Image Guided Deformation

In this paper, we propose an end-to-end deep learning architecture that generates 3D triangular meshes from single color images. Limited by the nature of the prevalent deep learning techniques, the majority of previous works usually represent 3D shapes in 3D volumes or point clouds. However, it is non-trivial to convert them to compact and ready-to-use mesh models. Unlike the existing methods, our network represents 3D shapes in meshes, which are essentially graphs and well suited for graph-based convolutional neural network. Leveraging on perceptual features extracted from the input image, our network produces correct geometry by progressively deforming an ellipsoid. To make the whole deformation procedure stable, we adopt a coarse-to-fine strategy, and define various mesh/surface related losses to capture properties of different levels; this guarantees visually appealing and physically accurate 3D geometry. In addition to producing accurate 3D shape on the 3D ShapeNet dataset, our model by nature can be adapted to objects in specific domains, e.g. human face, and easily extended to learn per-vertex properties, e.g. color. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.

[1]  Karthik Ramani,et al.  SurfNet: Generating 3D Shape Surfaces Using Deep Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Leonidas J. Guibas,et al.  Estimating image depth using shape collections , 2014, ACM Trans. Graph..

[3]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[4]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Stefano Soatto,et al.  Multi-View Stereo Reconstruction of Dense Shape and Complex Appearance , 2005, International Journal of Computer Vision.

[6]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yinda Zhang,et al.  Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Thomas Brox,et al.  What Do Single-View 3D Reconstruction Networks Learn? , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[10]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[14]  Anders P. Eriksson,et al.  Image2Mesh: A Learning Framework for Single Image 3D Reconstruction , 2017, ACCV.

[15]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[16]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[18]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[19]  Kang Ryoung Park,et al.  Single view-based 3D face reconstruction robust to self-occlusion , 2012, EURASIP J. Adv. Signal Process..

[20]  William J. Christmas,et al.  A Multiresolution 3D Morphable Face Model and Fitting Framework , 2016, VISIGRAPP.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Paolo Cignoni,et al.  MeshLab: an Open-Source Mesh Processing Tool , 2008, Eurographics Italian Chapter Conference.

[23]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[24]  Leonidas J. Guibas,et al.  SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Anders P. Eriksson,et al.  Image2Mesh: A Learning Framework for Single Image 3D Reconstruction , 2017, ACCV.

[27]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[28]  Ioannis A. Kakadiaris,et al.  End-to-End 3D Face Reconstruction with Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiangyu Zhu,et al.  High-fidelity Pose and Expression Normalization for face recognition in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[31]  Stefan Roth,et al.  Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[33]  Vladlen Koltun,et al.  Single-view reconstruction via joint analysis of image and shape collections , 2015, ACM Trans. Graph..

[34]  Jonathan Masci,et al.  Learning shape correspondence with anisotropic convolutional neural networks , 2016, NIPS.

[35]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[37]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[40]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[41]  Jiajun Wu,et al.  Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Zhuwen Li,et al.  Interactive Image Segmentation with Latent Diversity , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[44]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Simon Fuhrmann,et al.  Virtual rephotography: novel view prediction error for 3D reconstruction , 2016, TOGS.

[46]  Zhi-Hua Zhou,et al.  Abductive learning: towards bridging machine learning and logical reasoning , 2019, Science China Information Sciences.

[47]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[49]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[50]  Gabriel Taubin,et al.  The ball-pivoting algorithm for surface reconstruction , 1999, IEEE Transactions on Visualization and Computer Graphics.

[51]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[52]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).