Dense Monocular Depth Estimation in Complex Dynamic Scenes

We present an approach to dense depth estimation from a single monocular camera that is moving through a dynamic scene. The approach produces a dense depth map from two consecutive frames. Moving objects are reconstructed along with the surrounding environment. We provide a novel motion segmentation algorithm that segments the optical flow field into a set of motion models, each with its own epipolar geometry. We then show that the scene can be reconstructed based on these motion models by optimizing a convex program. The optimization jointly reasons about the scales of different objects and assembles the scene in a common coordinate frame, determined up to a global scale. Experimental results demonstrate that the presented approach outperforms prior methods for monocular depth estimation in dynamic scenes.

[1]  Pascal Fua,et al.  Deformable Surface 3D Reconstruction from Monocular Images , 2010, Synthesis Lectures on Computer Vision.

[2]  Michael J. Black,et al.  Intrinsic Depth: Improving Depth Transfer with Intrinsic Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Yuri Boykov,et al.  Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[4]  Antonin Chambolle,et al.  On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[5]  P. Torr Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[6]  Luc Van Gool,et al.  Multibody Structure-from-Motion in Practice , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Hujun Bao,et al.  Simultaneous multi-body stereo and segmentation , 2011, 2011 International Conference on Computer Vision.

[8]  GeigerA,et al.  Vision meets robotics , 2013 .

[9]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[10]  Jitendra Malik,et al.  Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[11]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[12]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[13]  Andrew W. Fitzgibbon,et al.  Highly Overparameterized Optical Flow Using PatchMatch Belief Propagation , 2014, ECCV.

[14]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Daniel Cremers,et al.  A Convex Approach to Minimal Partitions , 2012, SIAM J. Imaging Sci..

[17]  Vladlen Koltun,et al.  Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lieven Eeckhout,et al.  Deformable Surface 3D Reconstruction from Monocular Images , 2010 .

[19]  Flavio Fontana,et al.  Autonomous, Vision‐based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle , 2016, J. Field Robotics.

[20]  Lourdes Agapito,et al.  Dense multibody motion estimation and reconstruction from a handheld camera , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[21]  Michael J. Black,et al.  A Fully-Connected Layered Model of Foreground and Background Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Stephen P. Boyd,et al.  Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[23]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[24]  Michael J. Black,et al.  Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[26]  Kiriakos N. Kutulakos,et al.  Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Zhengyou Zhang,et al.  Determining the Epipolar Geometry and its Uncertainty: A Review , 1998, International Journal of Computer Vision.

[28]  Shiyu Song,et al.  Joint SFM and detection cues for monocular 3D localization in road scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  K. Madhava Krishna,et al.  Realtime multibody visual SLAM with a smoothly moving monocular camera , 2011, 2011 International Conference on Computer Vision.

[30]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Takeo Kanade,et al.  Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Carlos Hernández,et al.  Video-based, real-time multi-view stereo , 2011, Image Vis. Comput..

[33]  Lourdes Agapito,et al.  Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  James E. Cutting,et al.  Chapter 3 – Perceiving Layout and Knowing Distances: The Integration, Relative Potency, and Contextual Use of Different Information about Depth* , 1995 .

[35]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Andrew W. Fitzgibbon,et al.  Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[37]  Andrea Fusiello,et al.  T-Linkage: A Continuous Relaxation of J-Linkage for Multi-model Fitting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[39]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[40]  Didier Stricker,et al.  Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).