General Dynamic Scene Reconstruction from Multiple View Video

This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques or dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure, and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance.

[1]  Zhengyou Zhang,et al.  On the epipolar geometry between two images with lens distortion , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[2]  Roberto Cipolla,et al.  Automatic 3D object segmentation in multiple views using volumetric graph-cuts , 2007, Image Vis. Comput..

[3]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Cheng Lei,et al.  A new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Adrian Hilton,et al.  Segmentation Based Features for Wide-Baseline Multi-view Reconstruction , 2015, 2015 International Conference on 3D Vision.

[6]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[7]  Radu Bogdan Rusu,et al.  Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments , 2010, KI - Künstliche Intelligenz.

[8]  Woontack Woo,et al.  Silhouette Segmentation in Multiple Views , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  William T. Freeman,et al.  Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1999, International Journal of Computer Vision.

[13]  Jean-Yves Guillemaut,et al.  Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[14]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[16]  Ling Xu,et al.  Corner detection based on gradient correlation matrices of planar curves , 2010, Pattern Recognit..

[17]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[18]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[19]  Minsu Cho,et al.  Multi-object reconstruction from dynamic scenes: An object-centered approach , 2013, Comput. Vis. Image Underst..

[20]  Hujun Bao,et al.  3D Reconstruction of Dynamic Scenes with Multiple Handheld Cameras , 2012, ECCV.

[21]  Jean-Yves Guillemaut,et al.  Outdoor Dynamic 3-D Scene Reconstruction , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Jean-Yves Guillemaut,et al.  Temporal trimap propagation for video matting using inferential statistics , 2011, 2011 18th IEEE International Conference on Image Processing.

[23]  Marc Pollefeys,et al.  Modeling Dynamic Scenes Recorded with Freely Moving Cameras , 2010, ACCV.

[24]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ruigang Yang,et al.  Dealing with textureless regions and specular highlights - a progressive space carving scheme using a novel photo-consistency measure , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Patrick Pérez,et al.  Sparse Multi-View Consistency for Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Larry H. Matthies,et al.  Stereo vision for planetary rovers: Stochastic modeling to near real-time implementation , 1991, Optics & Photonics.

[28]  Jan-Michael Frahm,et al.  Sparse Dynamic 3D Reconstruction from Unsynchronized Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Hongyang Chao,et al.  Joint Multiview Segmentation and Localization of RGB-D Images Using Depth-Induced Silhouette Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Hujun Bao,et al.  Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Andrew Blake,et al.  Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Rui Yu,et al.  Direct, Dense, and Deformable: Template-Based Non-rigid 3D Reconstruction from RGB Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Cristian Sminchisescu,et al.  Large Displacement 3D Scene Flow with Occlusion Reasoning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[42]  Takashi Matsuyama,et al.  Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  A. Hilton,et al.  INITIAL DISPARITY ESTIMATION USING SPARSE MATCHING FOR WIDE-BASELINE DENSE STEREO , 2014 .

[46]  Jean-Yves Guillemaut,et al.  Calibration of Nodal and Free-Moving Cameras in Dynamic Scenes for Post-Production , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[47]  Marc Pollefeys,et al.  Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Long Quan,et al.  Silhouette Extraction from Multiple Images of Unknown Background , 2004 .

[49]  Marc Pollefeys,et al.  Multi-view Occlusion Reasoning for Probabilistic Silhouette-Based Dynamic Scene Reconstruction , 2010, International Journal of Computer Vision.

[50]  M. Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, ACM Trans. Graph..

[51]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[52]  Luc Van Gool,et al.  Simultaneous Segmentation and 3D Reconstruction of Monocular Image Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[53]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[54]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[55]  Qi-Xing Huang,et al.  Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).