Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes

Monocular depth reconstruction of complex and dynamic scenes is a highly challenging problem. While for rigid scenes learning-based methods have been offering promising results even in unsupervised cases, there exists little to no literature addressing the same for dynamic and deformable scenes. In this work, we present an unsupervised monocular framework for dense depth estimation of dynamic scenes, which jointly reconstructs rigid and nonrigid parts without explicitly modelling the camera motion. Using dense correspondences, we derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points. In this process, the dense depth map is learned implicitly using the asrigid-as-possible hypothesis. Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes. Furthermore, the proposed method also provides unsupervised motion segmentation results as an auxiliary output.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kiriakos N. Kutulakos,et al.  Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[5]  P. J. Narayanan,et al.  Structured Adversarial Training for Unsupervised Monocular Depth Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[6]  Pascal Fua,et al.  Local Non-Rigid Structure-From-Motion From Diffeomorphic Mappings , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[8]  Jing Xiao,et al.  Uncalibrated perspective reconstruction of deformable structures , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Amnon Shashua,et al.  Trajectory Triangulation: 3D Reconstruction of Moving Points from a Monocular Image Sequence , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Lior Wolf,et al.  On Projection Matrices Pk-> P2k=3, ..., 6, and their Applications in Computer Vision , 2002 .

[11]  Lourdes Agapito,et al.  Soft Inextensibility Constraints for Template-Free Non-rigid Reconstruction , 2012, ECCV.

[12]  Marc Alexa,et al.  As-rigid-as-possible shape interpolation , 2000, SIGGRAPH.

[13]  Yoichi Sato,et al.  Fast Multi-frame Stereo Scene Flow with Motion Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Francesc Moreno-Noguer,et al.  Image Collection Pop-up: 3D Reconstruction and Clustering of Rigid and Non-rigid Categories , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[16]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[17]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Carl Olsson,et al.  A Convex Approach to Low Rank Matrix Approximation with Missing Data , 2009, SCIA.

[19]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Luc Van Gool,et al.  Reconstructing 3D independent motions using non-accidentalness , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Pascal Fua,et al.  Linear Local Models for Monocular Reconstruction of Deformable Surfaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Walter Whiteley,et al.  Rigidity and scene analysis , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[23]  Richard M. Crowder,et al.  Depth estimation for autonomous robot navigation: A comparative approach , 2010, 2010 IEEE International Conference on Imaging Systems and Techniques.

[24]  Stefano Soatto,et al.  Unsupervised Moving Object Detection via Contextual Information Separation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hongdong Li,et al.  A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization , 2012, International Journal of Computer Vision.

[26]  Hongdong Li,et al.  “Maximizing Rigidity” Revisited: A Convex Programming Approach for Generic 3D Shape Reconstruction from Multiple Perspective Views , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Lourdes Agapito,et al.  Good Vibrations: A Modal Analysis Approach for Sequential Non-rigid Structure from Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Daniel Pizarro-Perez,et al.  Robust Isometric Non-Rigid Structure-from-Motion , 2020, ArXiv.

[29]  Richard Szeliski,et al.  Consistent video depth estimation , 2020, ACM Trans. Graph..

[30]  Kai Lawonn,et al.  Depth Perception in Projective Augmented Reality: An Evaluation of Advanced Visualization Techniques , 2019, VRST.

[31]  Matthew Brand,et al.  Flexible flow for 3D nonrigid tracking and shape recovery , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[32]  Takeo Igarashi,et al.  As-rigid-as-possible shape manipulation , 2005, SIGGRAPH '05.

[33]  Michael R. Lyu,et al.  DDFlow: Learning Optical Flow with Unlabeled Data Distillation , 2019, AAAI.

[34]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Adrien Bartoli,et al.  Isometric Non-rigid Shape-from-Motion in Linear Time , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[37]  Luc Van Gool,et al.  Incremental Non-Rigid Structure-from-Motion with Unknown Focal Length , 2018, ECCV.

[38]  Imran Khan Non-Rigid Structure-From-Motion With Uniqueness Constraint and Low Rank Matrix Fitting Factorization , 2014, IEEE Transactions on Multimedia.

[39]  Cédric Herzet,et al.  Elastic Shape-from-Template with Spatially Sparse Deforming Forces , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[41]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[42]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[43]  Adrien Bartoli,et al.  As-Rigid-as-Possible Volumetric Shape-from-Template , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Adrien Bartoli,et al.  Non-Rigid Shape-from-Motion for Isometric Surfaces using Infinitesimal Planarity , 2014, BMVC.

[45]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[46]  Wei Xu,et al.  Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Fei Wang,et al.  Template-Free 3D Reconstruction of Poorly-Textured Nonrigid Surfaces , 2016, ECCV.

[48]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[49]  Hang Zhao,et al.  Unsupervised Monocular Depth Learning in Dynamic Scenes , 2020, CoRL.

[50]  Lior Wolf,et al.  Homography Tensors: On Algebraic Entities that Represent Three Views of Static or Moving Planar Points , 2000, ECCV.

[51]  René Vidal,et al.  Perspective Nonrigid Shape and Motion Recovery , 2008, ECCV.

[52]  Hongdong Li,et al.  Dense Depth Estimation of a Complex Dynamic Scene without Explicit 3D Motion Estimation , 2019, 1902.03791.

[53]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[55]  Yuncai Liu,et al.  Monocular 3-D Tracking of Inextensible Deformable Surfaces Under $L_2$ -Norm , 2010, IEEE Transactions on Image Processing.

[56]  Chao Liu,et al.  Neural RGB®D Sensing: Depth and Uncertainty From a Video Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[58]  Peter N. Belhumeur,et al.  Closing ranks in vehicle formations based on rigidity , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[59]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[62]  Michael J. Black,et al.  Optical Flow in Mostly Rigid Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Konrad Schindler,et al.  3D Scene Flow Estimation with a Piecewise Rigid Scene Model , 2015, International Journal of Computer Vision.

[64]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Guang-Zhong Yang,et al.  Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery , 2017, ArXiv.

[67]  Jitendra Malik,et al.  Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[68]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[69]  Andrea Vedaldi,et al.  Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Hongdong Li,et al.  Multi-view structure computation without explicitly estimating motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[71]  Sridha Sridharan,et al.  Closed-Form Solutions for Low-Rank Non-Rigid Reconstruction , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[72]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[73]  Cordelia Schmid,et al.  Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[74]  Tsuhan Chen,et al.  Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models , 2010, NIPS.

[75]  Adrien Bartoli,et al.  Monocular Template-based Reconstruction of Inextensible Surfaces , 2011, International Journal of Computer Vision.

[76]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Adrien Bartoli,et al.  Shape-From-Template with Curves , 2019, International Journal of Computer Vision.

[78]  Adrien Bartoli,et al.  Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[80]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[81]  Adrien Bartoli,et al.  Coarse-to-fine low-rank structure-from-motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Daniel Pizarro-Perez,et al.  Inextensible Non-Rigid Shape-from-Motion by Second-Order Cone Programming , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).