Temporally Coherent General Dynamic Scene Reconstruction

Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints. This paper introduces a general approach to obtain a 4D representation of complex dynamic scenes from multi-view wide-baseline static or moving cameras without prior knowledge of the scene structure, appearance, or illumination. Contributions of the work are: an automatic method for initial coarse reconstruction to initialize joint estimation; sparse-to-dense temporal correspondence integrated with joint multi-view segmentation and reconstruction to introduce temporal coherence; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes by introducing shape constraint. Comparison with state-of-the-art approaches on a variety of complex indoor and outdoor scenes, demonstrates improved accuracy in both multi-view segmentation and dense reconstruction. This paper demonstrates unsupervised reconstruction of complete temporally coherent 4D scene models with improved non-rigid object segmentation and shape reconstruction and its application to various applications such as free-view rendering and virtual reality.

[1]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[2]  Daniel Cremers,et al.  Stereoscopic Scene Flow Computation for 3D Motion Understanding , 2011, International Journal of Computer Vision.

[3]  Didier Stricker,et al.  Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Patrick Pérez,et al.  Sparse Multi-View Consistency for Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  G. Rote,et al.  On the Bounding Boxes Obtained by Principal Component Analysis , 2006 .

[6]  Yasushi Yagi,et al.  Reflectance and Shape Estimation with a Light Field Camera Under Natural Illumination , 2019, International Journal of Computer Vision.

[7]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Larry H. Matthies,et al.  Stereo vision for planetary rovers: Stochastic modeling to near real-time implementation , 1991, Optics & Photonics.

[9]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[10]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[11]  Luc Van Gool,et al.  Simultaneous Segmentation and 3D Reconstruction of Monocular Image Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Marc Pollefeys,et al.  Modeling Dynamic Scenes Recorded with Freely Moving Cameras , 2010, ACCV.

[14]  Minglun Gong,et al.  Stereo-Based 3D Reconstruction of Dynamic Fluid Surfaces by Global Optimization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[16]  VekslerOlga,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001 .

[17]  Adrian Hilton,et al.  MSFD: Multi-Scale Segmentation-Based Feature Detection for Wide-Baseline Scene Reconstruction , 2019, IEEE Transactions on Image Processing.

[18]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Toby P. Breckon,et al.  Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Allen R. Hanson,et al.  Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Jean-Yves Guillemaut,et al.  General Dynamic Scene Reconstruction from Multiple View Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[24]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[27]  Marc Pollefeys,et al.  Multi-view Occlusion Reasoning for Probabilistic Silhouette-Based Dynamic Scene Reconstruction , 2010, International Journal of Computer Vision.

[28]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[29]  Roberto Cipolla,et al.  Automatic 3D object segmentation in multiple views using volumetric graph-cuts , 2007, Image Vis. Comput..

[30]  Hujun Bao,et al.  Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Andrew Blake,et al.  Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Marcus A. Magnor,et al.  Space-time isosurface evolution for temporally coherent 3D reconstruction , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Alan L. Yuille,et al.  The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference , 2000, NIPS.

[35]  Cheng Lei,et al.  A new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Alexander H. Liu,et al.  Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Olga Veksler,et al.  Semiautomatic segmentation with compact shape prior , 2009, Image Vis. Comput..

[38]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Andreas Geiger,et al.  Learning 3D Shape Completion Under Weak Supervision , 2018, International Journal of Computer Vision.

[40]  Adrian Hilton,et al.  4D Match Trees for Non-rigid Surface Alignment , 2016, ECCV.

[41]  Matthias Zwicker,et al.  Specular-to-Diffuse Translation for Multi-View Reconstruction , 2018, ECCV.

[42]  Adrian Hilton,et al.  A Free-Viewpoint Video Renderer , 2009, J. Graphics, GPU, & Game Tools.

[43]  Yaser Sheikh,et al.  Spatiotemporal Bundle Adjustment for Dynamic 3D Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Andrew Blake,et al.  Geodesic star convexity for interactive image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[46]  Richard Szeliski,et al.  Multiple View Object Cosegmentation Using Appearance and Stereo Cues , 2012, ECCV.

[47]  Jean-Yves Guillemaut,et al.  Temporally Coherent 4D Reconstruction of Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Hujun Bao,et al.  3D Reconstruction of Dynamic Scenes with Multiple Handheld Cameras , 2012, ECCV.

[49]  Daniel Cremers,et al.  Generalized Connectivity Constraints for Spatio-temporal 3D Reconstruction , 2014, ECCV.

[50]  Jean-Yves Guillemaut,et al.  Outdoor Dynamic 3-D Scene Reconstruction , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Adrian Hilton,et al.  Segmentation Based Features for Wide-Baseline Multi-view Reconstruction , 2015, 2015 International Conference on 3D Vision.

[52]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[53]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Minsu Cho,et al.  Multi-object reconstruction from dynamic scenes: An object-centered approach , 2013, Comput. Vis. Image Underst..

[56]  Jean-Yves Guillemaut,et al.  Temporal trimap propagation for video matting using inferential statistics , 2011, 2011 18th IEEE International Conference on Image Processing.

[57]  ZENG,et al.  SILHOUETTE EXTRACTION FROM MULTIPLE IMAGES OF AN UNKNOWN BACKGROUND Gang , 2003 .

[58]  Jean-Yves Guillemaut,et al.  Space-Time Joint Multi-layer Segmentation and Depth Estimation , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[59]  Patrick Pérez,et al.  Multi-view Object Segmentation in Space and Time , 2013, 2013 IEEE International Conference on Computer Vision.

[60]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[61]  Radu Bogdan Rusu,et al.  Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments , 2010, KI - Künstliche Intelligenz.

[62]  Woontack Woo,et al.  Silhouette Segmentation in Multiple Views , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Yael Moses,et al.  Multi-view scene flow estimation: A view centered variational approach , 2010, CVPR.

[64]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[65]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[66]  Pushmeet Kohli,et al.  Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts , 2008, International Journal of Computer Vision.

[67]  Marc Pollefeys,et al.  Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[68]  Olga Veksler,et al.  Star Shape Prior for Graph-Cut Image Segmentation , 2008, ECCV.

[69]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Takashi Matsuyama,et al.  Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[71]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[72]  Marc Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, SIGGRAPH 2010.

[73]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Chongyang Ma,et al.  Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.