Depth synthesis and local warps for plausible image-based navigation

Modern camera calibration and multiview stereo techniques enable users to smoothly navigate between different views of a scene captured using standard cameras. The underlying automatic 3D reconstruction methods work well for buildings and regular structures but often fail on vegetation, vehicles, and other complex geometry present in everyday urban scenes. Consequently, missing depth information makes Image-Based Rendering (IBR) for such scenes very challenging. Our goal is to provide plausible free-viewpoint navigation for such datasets. To do this, we introduce a new IBR algorithm that is robust to missing or unreliable geometry, providing plausible novel views even in regions quite far from the input camera positions. We first oversegment the input images, creating superpixels of homogeneous color content which often tends to preserve depth discontinuities. We then introduce a depth synthesis approach for poorly reconstructed regions based on a graph structure on the oversegmentation and appropriate traversal of the graph. The superpixels augmented with synthesized depth allow us to define a local shape-preserving warp which compensates for inaccurate depth. Our rendering algorithm blends the warped images, and generates plausible image-based novel views for our challenging target scenes. Our results demonstrate novel view synthesis in real time for multiple challenging scenes with significant depth complexity, providing a convincing immersive navigation experience.

[1]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Anita Sellent,et al.  Floating Textures , 2008, Comput. Graph. Forum.

[3]  Marc Pollefeys,et al.  Multi-View Stereo via Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  George Drettakis,et al.  Silhouette‐Aware Warping for Image‐Based Rendering , 2011, Comput. Graph. Forum.

[5]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Sing Bing Kang,et al.  Stereo for Image-Based Rendering using Image Over-Segmentation , 2007, International Journal of Computer Vision.

[7]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, ACM Trans. Graph..

[8]  Wojciech Matusik,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[9]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Richard Szeliski,et al.  Piecewise planar stereo for image-based rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[12]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[13]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[14]  Nebojsa Jojic,et al.  Consistent segmentation for optical flow estimation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[16]  Richard Szeliski,et al.  Multiple View Object Cosegmentation Using Appearance and Stereo Cues , 2012, ECCV.

[17]  John Hart,et al.  ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[18]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Leonard McMillan,et al.  Plenoptic Modeling: An Image-Based Rendering System , 2023 .

[20]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  George Drettakis,et al.  Perception of Visual Artifacts in Image‐Based Rendering of Façades , 2011, EGSR '11.

[22]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[23]  Marcus A. Magnor,et al.  Perception-motivated interpolation of image sequences , 2008, TAP.

[24]  R. Szeliski,et al.  Ambient point clouds for view interpolation , 2010, ACM Trans. Graph..

[25]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Michael Goesele,et al.  Image-based rendering for scenes with reflections , 2012, ACM Trans. Graph..

[27]  Klaus Diepold,et al.  Dense disparity maps from sparse disparity measurements , 2011, 2011 International Conference on Computer Vision.

[28]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[29]  X. Zabulis,et al.  Region-Based Dense Depth Extraction from Multi-View Video , 2007, 2007 IEEE International Conference on Image Processing.

[30]  Jiawen Chen,et al.  The video mesh: A data structure for image-based three-dimensional video editing , 2011, 2011 IEEE International Conference on Computational Photography (ICCP).

[31]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  M. Goesele,et al.  Fusion of depth maps with multiple scales , 2011, ACM Trans. Graph..

[33]  Patrick Pérez,et al.  Object removal by exemplar-based inpainting , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[34]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[35]  YANQING CHEN,et al.  Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .

[36]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[37]  Michael Gleicher,et al.  Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[38]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[39]  Michael Cohen,et al.  Enhancing and experiencing spacetime resolution with videos and stills , 2009, 2009 IEEE International Conference on Computational Photography (ICCP).

[40]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[41]  George Drettakis,et al.  Silhouette-Aware Warping for Image-Based Rendering - Supplementary material , 2011 .

[42]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[43]  Jana Kosecka,et al.  Multi-view Superpixel Stereo in Urban Environments , 2010, International Journal of Computer Vision.

[44]  Anita Sellent,et al.  Virtual Video Camera: Image‐Based Viewpoint Navigation Through Space and Time , 2010, Comput. Graph. Forum.

[45]  Pietro Perona,et al.  Unsupervised Learning of Categorical Segments in Image Collections , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.