A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation

Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.

[1]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[2]  Lars Bretzner,et al.  Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[3]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bodo Rosenhahn,et al.  Pose Estimation of 3D Free-Form Contours , 2005, International Journal of Computer Vision.

[5]  Emmanuel Prados,et al.  Gradient Flows for Optimizing Triangular Mesh-based Surfaces: Applications to 3D Reconstruction Problems Dealing with Visibility , 2011, International Journal of Computer Vision.

[6]  Victor S. Lempitsky,et al.  Oriented Visibility for Multiview Reconstruction , 2006, ECCV.

[7]  Francisco J. Serón,et al.  A survey on participating media rendering techniques , 2005, The Visual Computer.

[8]  Hans-Peter Seidel,et al.  Shading-based dynamic shape refinement from multi-view video under general illumination , 2011, 2011 International Conference on Computer Vision.

[9]  G. Rybicki Radiative transfer , 2019, Climate Change and Terrestrial Ecosystem Modeling.

[10]  Kun Zhou,et al.  Fogshop: Real-Time Design and Rendering of Inhomogeneous, Single-Scattering Media , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[11]  Michael J. Jones,et al.  Model-Based Matching by Linear Combinations of Prototypes , 1996 .

[12]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[13]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[14]  Michael J. Black,et al.  Guest Editorial: State of the Art in Image- and Video-Based Human Pose and Motion Estimation , 2010, International Journal of Computer Vision.

[15]  Stefano Soatto,et al.  Stereoscopic Segmentation , 2001, ICCV.

[16]  Kwang In Kim,et al.  Outdoor Human Motion Capture by Simultaneous Optimization of Pose and Camera Parameters , 2015, Comput. Graph. Forum.

[17]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[18]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[19]  John C. Stutz,et al.  Modeling Images of Natural 3D Surfaces: Overview and Potential Applications , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  Olaf Kähler,et al.  3D Tracking of Multiple Objects with Identical Appearance Using RGB-D Input , 2014, 2014 2nd International Conference on 3D Vision.

[21]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[24]  Ergun Akleman,et al.  Practical Global Illumination for Hair Rendering , 2007 .

[25]  Kun Zhou,et al.  Variational sphere set approximation for solid objects , 2006, The Visual Computer.

[26]  Pau Gargallo,et al.  Minimizing the Reprojection Error in Surface Reconstruction from Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[28]  Jonathan Tompson,et al.  Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Pascal Fua,et al.  Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Takashi Matsuyama,et al.  Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[32]  Wenzel Jakob,et al.  Progressive Expectation‐Maximization for Hierarchical Volumetric Photon Mapping , 2011, EGSR '11.

[33]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.