Learning non-rigid surface reconstruction from spatia-temporal image patches

We present a method to reconstruct a dense spatio-temporal depth map of a non-rigidly deformable object directly from a video sequence. The estimation of depth is performed locally on spatio-temporal patches of the video, and then the full depth video of the entire shape is recovered by combining them together. Since the geometric complexity of a local spatio-temporal patch of a deforming non-rigid object is often simple enough to be faithfully represented with a parametric model, we artificially generate a database of small deforming rectangular meshes rendered with different material properties and light conditions, along with their corresponding depth videos, and use such data to train a convolutional neural network. We tested our method on both synthetic and Kinect data and experimentally observed that the reconstruction error is significantly lower than the one obtained using other approaches like conventional non-rigid structure from motion.

[1]  Ronen Basri,et al.  Photometric Stereo , 2014, Computer Vision, A Reference Guide.

[2]  Francesc Moreno-Noguer,et al.  Sequential Non-Rigid Structure from Motion Using Physical Priors , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Aleix M. Martínez,et al.  Non-rigid structure from motion with complementary rank-3 spaces , 2011, CVPR 2011.

[5]  Michael Goesele,et al.  A Survey of Photometric Stereo Techniques , 2015, Found. Trends Comput. Graph. Vis..

[6]  S. Ullman The interpretation of structure from motion , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[8]  Moshe Ben-Ezra,et al.  Photometric Stereo for Dynamic Surface Orientations , 2010, ECCV.

[9]  R. Howe,et al.  ON CLASSICAL INVARIANT THEORY , 2010 .

[10]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Anoop Cherian,et al.  Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Scott D. Roth,et al.  Ray casting for modeling solids , 1982, Comput. Graph. Image Process..

[13]  Hongdong Li,et al.  Multi-Body Non-Rigid Structure-from-Motion , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[14]  Alessio Del Bue,et al.  A Benchmark and Evaluation of Non-Rigid Structure from Motion , 2018, International Journal of Computer Vision.

[15]  Katsushi Ikeuchi,et al.  Numerical Shape from Shading and Occluding Boundaries , 1981, Artif. Intell..

[16]  Ronen Basri,et al.  A Survey on Structure from Motion , 2017, ArXiv.

[17]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  AckermannJens,et al.  A Survey of Photometric Stereo Techniques , 2015 .

[19]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Carlos Hernández,et al.  Practical 3D Reconstruction Based on Photometric Stereo , 2010, Computer Vision: Detection, Recognition and Reconstruction.

[22]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[23]  Marc Pollefeys,et al.  A Factorization-Based Approach for Articulated Nonrigid Shape, Motion and Kinematic Chain Recovery From Video , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Hongdong Li,et al.  Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation , 2019, 1902.03791.

[25]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Mingyi He,et al.  Dense non-rigid structure-from-motion made easy — A spatial-temporal smoothness based solution , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[27]  Lieven Eeckhout,et al.  Deformable Surface 3D Reconstruction from Monocular Images , 2010 .

[28]  David J. Kriegman,et al.  The Bas-Relief Ambiguity , 2004, International Journal of Computer Vision.

[29]  Pascal Fua,et al.  Deformable Surface 3D Reconstruction from Monocular Images , 2010, Synthesis Lectures on Computer Vision.

[30]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[31]  Wangmeng Zuo,et al.  DAVANet: Stereo Deblurring With View Aggregation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael J. Swain,et al.  Shape from Texture , 1985, IJCAI.

[33]  Rudolf Mester,et al.  Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[35]  Graham Fyffe,et al.  Single-shot photometric stereo by spectral multiplexing , 2010, 2011 IEEE International Conference on Computational Photography (ICCP).

[36]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[37]  J. Tessendorf Simulating Ocean Water , 2004 .

[38]  Aleix M. Martínez,et al.  Kernel non-rigid structure from motion , 2011, 2011 International Conference on Computer Vision.

[39]  Simon Lucey,et al.  Deep Non-Rigid Structure From Motion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).