论文信息 - Dense Monocular Depth Estimation in Complex Dynamic Scenes

Dense Monocular Depth Estimation in Complex Dynamic Scenes

We present an approach to dense depth estimation from a single monocular camera that is moving through a dynamic scene. The approach produces a dense depth map from two consecutive frames. Moving objects are reconstructed along with the surrounding environment. We provide a novel motion segmentation algorithm that segments the optical flow field into a set of motion models, each with its own epipolar geometry. We then show that the scene can be reconstructed based on these motion models by optimizing a convex program. The optimization jointly reasons about the scales of different objects and assembles the scene in a common coordinate frame, determined up to a global scale. Experimental results demonstrate that the presented approach outperforms prior methods for monocular depth estimation in dynamic scenes.

[1] Pascal Fua,et al. Deformable Surface 3D Reconstruction from Monocular Images , 2010, Synthesis Lectures on Computer Vision.

[2] Michael J. Black,et al. Intrinsic Depth: Improving Depth Transfer with Intrinsic Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Yuri Boykov,et al. Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[4] Antonin Chambolle,et al. On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[5] P. Torr. Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[6] Luc Van Gool,et al. Multibody Structure-from-Motion in Practice , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Hujun Bao,et al. Simultaneous multi-body stereo and segmentation , 2011, 2011 International Conference on Computer Vision.

[8] GeigerA,et al. Vision meets robotics , 2013 .

[9] Stephen P. Boyd,et al. Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[10] Jitendra Malik,et al. Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction , 2014, NIPS.

[11] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[12] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[13] Andrew W. Fitzgibbon,et al. Highly Overparameterized Optical Flow Using PatchMatch Belief Propagation , 2014, ECCV.

[14] Cordelia Schmid,et al. EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jitendra Malik,et al. Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Daniel Cremers,et al. A Convex Approach to Minimal Partitions , 2012, SIAM J. Imaging Sci..

[17] Vladlen Koltun,et al. Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Lieven Eeckhout,et al. Deformable Surface 3D Reconstruction from Monocular Images , 2010 .

[19] Flavio Fontana,et al. Autonomous, Vision‐based Flight and Live Dense 3D Mapping with a Quadrotor Micro Aerial Vehicle , 2016, J. Field Robotics.

[20] Lourdes Agapito,et al. Dense multibody motion estimation and reconstruction from a handheld camera , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[21] Michael J. Black,et al. A Fully-Connected Layered Model of Foreground and Background Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Stephen P. Boyd,et al. Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[23] Stefano Soatto,et al. Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[24] Michael J. Black,et al. Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[26] Kiriakos N. Kutulakos,et al. Non-rigid structure from locally-rigid motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27] Zhengyou Zhang,et al. Determining the Epipolar Geometry and its Uncertainty: A Review , 1998, International Journal of Computer Vision.

[28] Shiyu Song,et al. Joint SFM and detection cues for monocular 3D localization in road scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] K. Madhava Krishna,et al. Realtime multibody visual SLAM with a smoothly moving monocular camera , 2011, 2011 International Conference on Computer Vision.

[30] Ce Liu,et al. Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Takeo Kanade,et al. Trajectory Space: A Dual Representation for Nonrigid Structure from Motion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Carlos Hernández,et al. Video-based, real-time multi-view stereo , 2011, Image Vis. Comput..

[33] Lourdes Agapito,et al. Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] James E. Cutting,et al. Chapter 3 – Perceiving Layout and Knowing Distances: The Integration, Relative Potency, and Contextual Use of Different Information about Depth* , 1995 .

[35] Aaron Hertzmann,et al. Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Andrew W. Fitzgibbon,et al. Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[37] Andrea Fusiello,et al. T-Linkage: A Continuous Relaxation of J-Linkage for Multi-model Fitting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Rui Yu,et al. Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[39] Edward H. Adelson,et al. Representing moving images with layers , 1994, IEEE Trans. Image Process..

[40] Didier Stricker,et al. Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).