Analytical Derivatives for Differentiable Renderer: 3D Pose Estimation by Silhouette Consistency

Differentiable render is widely used in optimization-based 3D reconstruction which requires gradients from differentiable operations for gradient-based optimization. The existing differentiable renderers obtain the gradients of rendering via numerical technique which is of low accuracy and efficiency. Motivated by this fact, a differentiable mesh renderer with analytical gradients is proposed. The main obstacle of rasterization based rendering being differentiable is the discrete sampling operation. To make the rasterization differentiable, the pixel intensity is defined as a double integral over the pixel area and the integral is approximated by anti-aliasing with an average filter. Then the analytical gradients with respect to the vertices coordinates can be derived from the continuous definition of pixel intensity. To demonstrate the effectiveness and efficiency of the proposed differentiable renderer, experiments of 3D pose estimation by only multi-viewpoint silhouettes were conducted. The experimental results show that 3D pose estimation without 3D and 2D joints supervision is capable of producing competitive results both qualitatively and quantitatively. The experimental results also show that the proposed differentiable renderer is of higher accuracy and efficiency compared with previous method of differentiable renderer.

[1]  Anat Levin,et al.  An Evaluation of Computational Imaging Techniques for Heterogeneous Inverse Scattering , 2016, ECCV.

[2]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[4]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[5]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[6]  Kathleen M. Robinette,et al.  Civilian American and European Surface Anthropometry Resource (CAESAR), Final Report. Volume 1. Summary , 2002 .

[7]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[8]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[9]  Hao Li,et al.  Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction , 2019, ArXiv.

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Stefan Leutenegger,et al.  Real-time height map fusion using differentiable rendering , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Alexei A. Efros,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Brian A. Barsky,et al.  A New Concept and Method for Line Clipping , 1984, TOGS.

[14]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[17]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[18]  Ersin Yumer,et al.  Material Editing Using a Physically Based Rendering Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[20]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[22]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[23]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  David J. Fleet,et al.  Model-Based 3D Hand Pose Estimation from Monocular Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[26]  Peter Shirley,et al.  Fundamentals of computer graphics , 2018 .

[27]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[28]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[29]  Wei Jiang,et al.  A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation , 2019, Applied Sciences.

[30]  Turner Whitted,et al.  An improved illumination model for shaded display , 1979, CACM.

[31]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Adrien Bousseau,et al.  Single-image SVBRDF capture with a rendering-aware deep network , 2018, ACM Trans. Graph..

[36]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.