3D Scene Flow Estimation with a Piecewise Rigid Scene Model

Abstract3D scene flow estimation aims to jointly recover dense geometry and 3D motion from stereoscopic image sequences, thus generalizes classical disparity and 2D optical flow estimation. To realize its conceptual benefits and overcome limitations of many existing methods, we propose to represent the dynamic scene as a collection of rigidly moving planes, into which the input images are segmented. Geometry and 3D motion are then jointly recovered alongside an over-segmentation of the scene. This piecewise rigid scene model is significantly more parsimonious than conventional pixel-based representations, yet retains the ability to represent real-world scenes with independent object motion. It, furthermore, enables us to define suitable scene priors, perform occlusion reasoning, and leverage discrete optimization schemes toward stable and accurate results. Assuming the rigid motion to persist approximately over time additionally enables us to incorporate multiple frames into the inference. To that end, each view holds its own representation, which is encouraged to be consistent across all other viewpoints and frames in a temporal window. We show that such a view-consistent multi-frame scheme significantly improves accuracy, especially in the presence of occlusions, and increases robustness against adverse imaging conditions. Our method currently achieves leading performance on the KITTI benchmark, for both flow and stereo.

[1]  Romain Dupont,et al.  A General Dense Image Matching Framework Combining Direct and Feature-Based Costs , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Gilad Adiv,et al.  Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Konrad Schindler,et al.  Piecewise Rigid Scene Flow , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Uwe Franke,et al.  Dense, Robust, and Accurate Motion Field Estimation from Stereo Image Sequences in Real-Time , 2010, ECCV.

[5]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[6]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Horst Bischof,et al.  Joint motion estimation and segmentation of complex scenes with label costs and occlusion modeling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[9]  Konrad Schindler,et al.  View-Consistent 3D Scene Flow Estimation over Multiple Frames , 2014, ECCV.

[10]  Daniel Cremers,et al.  High resolution motion layer decomposition using dual-space graph cuts , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  David W. Murray,et al.  Scene Segmentation from Visual Motion Using Global Optimization , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lourdes Agapito,et al.  A Variational Approach to Video Registration with Subspace Constraints , 2013, International Journal of Computer Vision.

[13]  Vladimir Kolmogorov,et al.  Optimizing Binary MRFs via Extended Roof Duality , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael J. Black,et al.  A Fully-Connected Layered Model of Foreground and Background Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Takeo Kanade,et al.  A Head-Wearable Short-Baseline Stereo System for the Simultaneous Estimation of Structure and Motion , 2011, MVA.

[16]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[17]  Michael J. Black,et al.  Robust dynamic motion estimation over time , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[19]  Hiroshi Ishikawa Higher-order clique reduction in binary graph cut , 2009, CVPR.

[20]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[21]  Thomas Pock,et al.  Non-local Total Generalized Variation for Optical Flow Estimation , 2014, ECCV.

[22]  Aly A. Farag,et al.  Optimizing Binary MRFs with Higher Order Cliques , 2008, ECCV.

[23]  Daniel Cremers,et al.  Anisotropic Huber-L1 Optical Flow , 2009, BMVC.

[24]  Pushmeet Kohli,et al.  Minimizing sparse higher order energy functions of discrete variables , 2009, CVPR.

[25]  Jean-Philippe Pons,et al.  Dense and Accurate Spatio-temporal Multi-view Stereovision , 2009, ACCV.

[26]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[28]  Tamir Hazan,et al.  Continuous Markov Random Fields for Robust Stereo Estimation , 2012, ECCV.

[29]  Frederic Devernay,et al.  Multi-Camera Scene Flow by Tracking 3-D Points and Surfels , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Henning Zimmer,et al.  Modeling temporal coherence for optical flow , 2011, 2011 International Conference on Computer Vision.

[35]  Michael J. Black,et al.  Layered image motion with explicit occlusions, temporal consistency, and depth ordering , 2010, NIPS.

[36]  Julian Eggert,et al.  Block-matching stereo with relaxed fronto-parallel assumption , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[37]  Lena Gorelick,et al.  Submodularization for Binary Pairwise Energies , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Michael J. Black,et al.  Estimating Optical Flow in Segmented Images Using Variable-Order Parametric Models With Local Deformations , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Daniel Cremers,et al.  Efficient Dense Scene Flow from Sparse or Dense Stereo Data , 2008, ECCV.

[40]  Hai Tao,et al.  Global matching criterion and color segmentation based stereo , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[41]  Andrew W. Fitzgibbon,et al.  SphereFlow: 6 DoF Scene Flow from RGB-D Pairs , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Alfred M. Bruckstein,et al.  Over-Parameterized Variational Optical Flow , 2007, International Journal of Computer Vision.

[43]  Kiriakos N. Kutulakos,et al.  Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance , 2002, International Journal of Computer Vision.

[44]  Konrad Schindler,et al.  An Evaluation of Data Costs for Optical Flow , 2013, GCPR.

[45]  Andrew Blake,et al.  Fusion Moves for Markov Random Field Optimization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Carsten Rother,et al.  FusionFlow: Discrete-continuous optimization for optical flow estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[48]  Yael Moses,et al.  Multi-view scene flow estimation: A view centered variational approach , 2010, CVPR.

[49]  T. Vaudrey,et al.  Differences between stereo and motion behaviour on synthetic and real-world stereo sequences , 2008, 2008 23rd International Conference Image and Vision Computing New Zealand.

[50]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[51]  Horst Bischof,et al.  Minimizing TGV-Based Variational Models with Non-convex Data Terms , 2013, SSVM.

[52]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[54]  Joachim Weickert,et al.  Joint Estimation of Motion, Structure and Geometry from Stereo Sequences , 2010, ECCV.

[55]  Joachim Weickert,et al.  Learning Brightness Transfer Functions for the Joint Recovery of Illumination Changes and Optical Flow , 2014, ECCV.

[56]  Raquel Urtasun,et al.  Robust Monocular Epipolar Flow Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Paria Mehrani,et al.  Superpixels and Supervoxels in an Energy Optimization Framework , 2010, ECCV.

[58]  Raúl Rojas,et al.  Weighted Semi-Global Matching and Center-Symmetric Census Transform for Robust Driver Assistance , 2013, CAIP.

[59]  In-So Kweon,et al.  A Tensor Voting Approach for Multi-view 3D Scene Flow Estimation and Refinement , 2012, ECCV.

[60]  Konrad Schindler,et al.  3D scene flow estimation with a rigid motion prior , 2011, 2011 International Conference on Computer Vision.

[61]  Bernd Jähne,et al.  Outdoor stereo camera system for the generation of real-world benchmark data sets , 2012 .

[62]  Li Xu,et al.  Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles , 2013, International Journal of Computer Vision.

[63]  Michal Irani,et al.  Multi-Frame Correspondence Estimation Using Subspace Constraints , 2002, International Journal of Computer Vision.

[64]  Kiriakos N. Kutulakos,et al.  Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[65]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[66]  Uwe Franke,et al.  Feature- and depth-supported modified total variation optical flow for 3D motion field estimation in real scenes , 2011, CVPR 2011.

[67]  Michael J. Black,et al.  Skin and bones: multi-layer, locally affine, optical flow and regularization with transparency , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[69]  Pushmeet Kohli,et al.  Surface stereo with soft segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.