3D video acquisition, representation & editing

3D video is a new kind of media that captures three-dimensional appearance and dynamics of real-world scenes by combining data from multiple input video streams to a consistent model. During playback, it provides the viewer with the possibility of freely choosing the viewpoint of a virtual camera in space and time. In this thesis, we present a 3D video system covering the full processing pipeline from acquisition over data representation, post-processing and editing up to high-quality display. The focus of our research lies on 3D video data representations. We investigate both image-based and geometric models. While the first show advantages in achievable output image quality, the latter provide more flexibility for applications beyond pure playback. In particular, we investigate methods for data streaming and compression, dynamic out-of-core data structures, and interactive 3D video editing. For data acquisition we introduce a scalable system based on multiple sparsely placed 3D video bricks. Each brick captures high-quality depth maps of the scene from its respective viewpoint, using a combination of spacetime stereo vision and structured light projection. Texture images and pattern-augmented views are acquired simultaneously by time-multiplexed projections and synchronized camera exposures. Our image space representation consisting of billboard planes augmented by detailed displacement maps combines the generality of acquired geometry with the regularization properties of purely image-based methods. Being placed in the disparity space of the acquisition cameras, the billboards provide a regular sampling of the scene with a uniform error model. Based on that, we propose a geometry filtering method which generates spatially and temporally coherent models and removes reconstruction noise as well as calibration errors. Rendering is performed using a GPU-accelerated algorithm which generates consistent viewdependent geometry and texture for each individual viewpoint. Point samples are the fundamental primitive of our second representation. By carrying multiple surface attributes such as position and color, they provide a unified model of geometry and appearance of natural scenes. This representation allows for application of various post-processing algorithms for improving noisy input geometry. In particular, we propose a method that effectively removes outliers by enforcing photo consistency with all input views. By augmenting the points by a statistical model of acquisition noise, smooth images of the scene from novel viewpoints can be generated using a probabilistic renderer based on GPU-accelerated EWA volume splatting.

[1]  Markus H. Gross,et al.  Embedding imperceptible patterns into projected images for simultaneous acquisition and display , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[2]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Reinhard Klein,et al.  Eurographics Symposium on Point-based Graphics (2006) Octree-based Point-cloud Compression , 2022 .

[4]  Olivier Faugeras,et al.  Three-Dimensional Computer Vision , 1993 .

[5]  Li Hong,et al.  Segment-based stereo matching using graph cuts , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  S. Birchfiled A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling , 1998 .

[7]  Henrique S. Malvar,et al.  High-quality linear interpolation for demosaicing of Bayer-patterned color images , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[9]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[10]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[11]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[12]  Daniel Cotting Smart displays in interactive visual workspaces , 2007 .

[13]  Eyal Ofek,et al.  Depth keying , 2003, IS&T/SPIE Electronic Imaging.

[14]  Matthias Zwicker,et al.  EWA volume splatting , 2001, Proceedings Visualization, 2001. VIS '01..

[15]  Marc Levoy,et al.  The Use of Points as a Display Primitive , 2000 .

[16]  Lance Williams,et al.  Casting curved shadows on curved surfaces , 1978, SIGGRAPH.

[17]  Markus H. Gross,et al.  3D Video Recorder: a System for Recording and Playing Free‐Viewpoint Video † , 2003, Comput. Graph. Forum.

[18]  Frédo Durand,et al.  Billboard clouds for extreme model simplification , 2003, ACM Trans. Graph..

[19]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[20]  André Oosterlinck,et al.  Range Image Acquisition with a Single Binary-Encoded Light Pattern , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Mark H. Overmars,et al.  The Design of Dynamic Data Structures , 1987, Lecture Notes in Computer Science.

[22]  Markus H. Gross,et al.  WinSGL: synchronizing displays in parallel graphics using cost-effective software genlocking , 2007, Parallel Comput..

[23]  Hans-Peter Seidel,et al.  Seeing People in Different Light — Joint Shape , Motion , and Reflectance Capture , 2007 .

[24]  Jon Louis Bentley,et al.  Decomposable Searching Problems , 1979, Inf. Process. Lett..

[25]  Chak-Kuen Wong,et al.  Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees , 1977, Acta Informatica.

[26]  Philip Dutré,et al.  Boolean Operations on Surfel-Bounded Solids Using Programmable Graphics Hardware , 2004, PBG.

[27]  Miao Liao,et al.  Real-time Global Stereo Matching Using Hierarchical Belief Propagation , 2006, BMVC.

[28]  Matthias Zwicker,et al.  Object Space EWA Surface Splatting: A Hardware Accelerated Approach to High Quality Point Rendering , 2002, Comput. Graph. Forum.

[29]  Markus H. Gross,et al.  CSG tree rendering for point-sampled objects , 2004, 12th Pacific Conference on Computer Graphics and Applications, 2004. PG 2004. Proceedings..

[30]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[31]  Li Zhang,et al.  Spacetime stereo: shape recovery for dynamic scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  Thomas Lewiner,et al.  Point set compression through BSP quantization , 2006, 2006 19th Brazilian Symposium on Computer Graphics and Image Processing.

[33]  Kenji Mase,et al.  Interactive video cubism , 1999, NPIVM '99.

[34]  Markus H. Gross,et al.  Progressive Compression of Point-Sampled Models , 2004, PBG.

[35]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[36]  Wojciech Matusik,et al.  Polyhedral Visual Hulls for Real-Time Rendering , 2001, Rendering Techniques.

[37]  Adam Finkelstein,et al.  Stylized video cubes , 2002, SCA '02.

[38]  Harry Shum,et al.  Pop-up light field: An interactive image-based modeling and rendering system , 2004, TOGS.

[39]  Tim Weyrich,et al.  Post-processing of Scanned 3D Surface Data , 2004, PBG.

[40]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[41]  Luc Van Gool,et al.  Blue-c: a spatially immersive display and 3D video portal for telepresence , 2003, IPT/EGVE.

[42]  Markus H. Gross,et al.  Interactive 3D video editing , 2006, The Visual Computer.

[43]  D. Levin,et al.  Mesh-Independent Surface Interpolation , 2004 .

[44]  Aljoscha Smolic,et al.  Image-space Free-viewpoint Video , 2005 .

[45]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[46]  Markus H. Gross,et al.  3D video fragments: dynamic point samples for real-time free-viewpoint video , 2004, Comput. Graph..

[47]  Markus H. Gross Visual computing - the integration of computer graphics, visual perception and imaging , 1994, Computer graphics: systems and applications.

[48]  Philip Dutré,et al.  Interactive boolean operations on surfel-bounded solids , 2003, ACM Trans. Graph..

[49]  Takeo Kanade,et al.  A Cooperative Algorithm for Stereo Matching and Occlusion Detection , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Matthias Zwicker,et al.  Perspective Accurate Splatting , 2004, Graphics Interface.

[51]  A. Adamson,et al.  Ray tracing point set surfaces , 2003, 2003 Shape Modeling International..

[52]  Ramesh Raskar,et al.  Computational Photography: Mastering New Techniques for Lenses, Lighting, and Sensors , 2009 .

[53]  S. Inokuchi,et al.  Range-imaging system for 3-D object recognition , 1984 .

[54]  Chaoli Wang,et al.  High dimensional direct rendering of time-varying volumetric data , 2003, IEEE Visualization, 2003. VIS 2003..

[55]  Markus H. Gross,et al.  Scalable 3D video of dynamic scenes , 2005, The Visual Computer.

[56]  Markus H. Gross,et al.  Spectral processing of point-sampled geometry , 2001, SIGGRAPH.

[57]  T. Yoshida,et al.  DVCPRO : A comprehensive format overview , 1995 .

[58]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[59]  Takeo Kanade,et al.  Spatio-Temporal View Interpolation , 2002, Rendering Techniques.

[60]  Dean Brown,et al.  Decentering distortion of lenses , 1966 .

[61]  Joaquim Salvi,et al.  Pattern codification strategies in structured light systems , 2004, Pattern Recognit..

[62]  Stephan Würmlin,et al.  Dynamic point samples as primitives for free-viewpoint video , 2004 .

[63]  Takeo Kanade,et al.  A multibaseline stereo system with active illumination and real-time image acquisition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[64]  J. Edmonds Paths, Trees, and Flowers , 1965, Canadian Journal of Mathematics.

[65]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[66]  Michael Elad,et al.  On the origin of the bilateral filter and ways to improve it , 2002, IEEE Trans. Image Process..

[67]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[68]  Jian Sun,et al.  Video object cut and paste , 2005, SIGGRAPH 2005.

[69]  Leonard McMillan,et al.  A Real-Time Distributed Light Field Camera , 2002, Rendering Techniques.

[70]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Jérémie Allard,et al.  SoftGenLock: active stereo and genlock for PC cluster , 2003 .

[72]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[73]  Matthias M. Wloka,et al.  Per-Pixel Displacement Mapping with Distance Functions , 2005 .

[74]  Markus H. Gross,et al.  3D Video Billboard Clouds , 2007, Comput. Graph. Forum.

[75]  Marc Alexa,et al.  Progressive point set surfaces , 2003, TOGS.

[76]  David Salesin,et al.  Video matting of complex scenes , 2002, SIGGRAPH.

[77]  Markus H. Gross,et al.  WinSGL: software genlocking for cost-effective display synchronization under microsoft windows , 2006, EGPGV '06.

[78]  Tomás Svoboda,et al.  A Convenient Multicamera Self-Calibration for Virtual Environments , 2005, Presence: Teleoperators & Virtual Environments.

[79]  Wojciech Matusik,et al.  3D TV: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes , 2004, ACM Trans. Graph..

[80]  Matthias Zwicker,et al.  Surfels: surface elements as rendering primitives , 2000, SIGGRAPH.

[81]  G. Iddan,et al.  3D IMAGING IN THE STUDIO (AND ELSEWHERE...) , 2001 .

[82]  Seon-Min Rhee,et al.  Low-Cost Telepresence for Collaborative Virtual Environments , 2007, IEEE Transactions on Visualization and Computer Graphics.

[83]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[84]  Emanuele Trucco,et al.  Efficient stereo with multiple windowing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[85]  Markus H. Gross,et al.  Real-time streaming of point-based 3D video , 2004, IEEE Virtual Reality 2004.

[86]  Paul S. Heckbert,et al.  Fundamentals of Texture Mapping and Image Warping , 1989 .

[87]  Leif Kobbelt,et al.  Efficient High Quality Rendering of Point Sampled Geometry , 2002, Rendering Techniques.

[88]  David Salesin,et al.  A Bayesian approach to digital matting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[89]  Markus H. Gross,et al.  Unconstrained free-viewpoint video coding , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[90]  Alexander A. Pasko,et al.  Constructive Hypervolume Modeling , 2001, Graph. Model..

[91]  M. Gross,et al.  DYNAMIC POINT SAMPLES FOR FREE-VIEWPOINT VIDEO , 2004 .

[92]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[93]  Markus H. Gross Getting to the Point...? , 2006, IEEE Computer Graphics and Applications.

[94]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[95]  Ben de Leeuw Digital cinematography , 1997 .

[96]  Takeo Kanade,et al.  Three-dimensional scene flow , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[97]  Jeffrey Scott Vitter,et al.  Bkd-Tree: A Dznamic Scalable kd-Tree , 2003, SSTD.

[98]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[99]  Szymon Rusinkiewicz,et al.  Spacetime Stereo: A Unifying Framework for Depth from Triangulation , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[100]  Andrea Fusiello,et al.  Background Initialization in Cluttered Sequences , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[101]  In Kyu Park,et al.  Depth image-based representations for static and animated 3D objects , 2002, Proceedings. International Conference on Image Processing.

[102]  Matthias Zwicker,et al.  Pointshop 3D: an interactive system for point-based surface editing , 2002, SIGGRAPH.

[103]  Nelson L. Max,et al.  Image-based rendering of range data with estimated depth uncertainty , 2004, IEEE Computer Graphics and Applications.

[104]  Meenakshisundaram Gopi,et al.  Eurographics Symposium on Point-based Graphics (2006) Octree-based Progressive Geometry Coding of Point Clouds , 2022 .

[105]  Markus H. Gross,et al.  Shape modeling with point-sampled geometry , 2003, ACM Trans. Graph..

[106]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[107]  M. Wimmer,et al.  Displacement Mapped Billboard Clouds , 2007 .

[108]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[109]  D. E. Vengro A transparent parallel I/O environment , 1994 .

[110]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[111]  Markus H. Gross,et al.  3D video recorder , 2002, 10th Pacific Conference on Computer Graphics and Applications, 2002. Proceedings..

[112]  Markus H. Gross,et al.  Point-sampled 3D video of real-world scenes , 2007, Signal Process. Image Commun..

[113]  Marc Levoy,et al.  High performance imaging using large camera arrays , 2005, SIGGRAPH 2005.

[114]  Tim Weyrich,et al.  A practical structured light acquisition system for point-based geometry and texture , 2005, Proceedings Eurographics/IEEE VGTC Symposium Point-Based Graphics, 2005..

[115]  Takeo Kanade,et al.  Markerless human motion transfer , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[116]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[117]  A. Verri,et al.  A compact algorithm for rectification of stereo pairs , 2000 .

[118]  Markus Gross,et al.  Dynamic Point Cloud Compression for Free Viewpoint Video , 2003 .

[119]  Frédo Durand,et al.  Defocus video matting , 2005, SIGGRAPH 2005.

[120]  Valerio Pascucci,et al.  Hypervolume visualization: a challenge in simplicity , 1998, IEEE Symposium on Volume Visualization (Cat. No.989EX300).

[121]  Jian Sun,et al.  Symmetric stereo matching for occlusion handling , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[122]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[123]  Leonard McMillan,et al.  Proscenium: a framework for spatio-temporal video editing , 2003, ACM Multimedia.

[124]  Wim Sweldens,et al.  Lifting scheme: a new philosophy in biorthogonal wavelet constructions , 1995, Optics + Photonics.

[125]  Kostas Daniilidis,et al.  View-independent scene acquisition for tele-presence , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[126]  Joachim Weickert,et al.  Anisotropic diffusion in image processing , 1996 .

[127]  Matthias Zwicker,et al.  High-quality surface splatting on today's GPUs , 2005, Proceedings Eurographics/IEEE VGTC Symposium Point-Based Graphics, 2005..

[128]  Marc Levoy,et al.  Streaming QSplat: a viewer for networked visualization of large, dense models , 2001, I3D '01.

[129]  Thomas Malzbender,et al.  Generalized Voxel Coloring , 1999, Workshop on Vision Algorithms.

[130]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[131]  Yee-Hong Yang,et al.  Stationary background generation: An alternative to the difference of two images , 1990, Pattern Recognit..

[132]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[133]  Marcus A. Magnor,et al.  Weighted Minimal Hypersurface Reconstruction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  William J. Cook,et al.  Computing Minimum-Weight Perfect Matchings , 1999, INFORMS J. Comput..

[135]  Marc Levoy,et al.  QSplat: a multiresolution point rendering system for large meshes , 2000, SIGGRAPH.

[136]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[137]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.