论文信息 - Acquisition, processing and display for 3D live-action cinema and television

Acquisition, processing and display for 3D live-action cinema and television

Three-dimensional cinema and television involves the presentation of a separate image to a viewer’s left and right eyes, in order to invoke a depth perception. Three-dimensional cinema and television provides filmmakers with an additional cue to aid in their storytelling. Current acquisition and manipulation approaches make it difficult to effectively exploit the additional depth dimension. In this thesis we examine the pipeline of acquisition, processing and display, and propose methods and approaches which make it easier to exploit the depth dimension, while also aiming to improve the quality of the three-dimensional viewing experience. Computing a depth value for each pixel in the video images of a captured scene is a difficult task. We propose an acquisition system where a central, high quality film camera is supported with additional satellite sensors. Rather than using sensors of a single modality, e.g. visible light cameras, we propose to use additional modalities. Besides lower quality visible light cameras, we also incorporate a Time-of-Flight depth camera and a thermal camera. By combining sensors of different modalities we aim to provide more information for computing per-pixel depth. The satellite cameras allow for better occlusion reasoning of the scene. A depth camera provides a direct measure of scene depth, albeit at a low resolution. Finally, a thermal imaging camera provides information to correctly discern between different scene elements, when those scene elements are imaged as regions with similar colors. We propose a method to combine the information from multiple modalities and demonstrate that we can compute high quality depth maps. Since we are dealing with motion pictures, it’s not sufficient to compute depth only for a single instant in time. The computed depth should be temporally consistent for the video. We argue that the temporally consistent depth is of most importance for foreground objects in a scene. We propose an interactive approach which propagates segmented foreground objects from a begin and end frame of a shot, to the frames in between. By grouping pixels with similar photometric and thermal properties into so-called superpixels, we reduce the complexity from per-pixel to per-superpixel. We then pose the problem as a labeling problem for superpixels over time, where the label that is assigned to each superpixel indicates to which segment that superpixel belongs. We show that this information can be directly exploited in the depth computation, where the segments are used as prior knowledge in that computation.

Jeroen van Baar | J. Baar

[1] Lenny Lipton. Brief history of electronic stereoscopic displays , 2012 .

[2] Mei Han,et al. Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3] Gordon Wetzstein,et al. Radiometric Compensation through Inverse Light Transport , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[4] D. Nistér,et al. Stereo Matching with Color-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Berthold K. P. Horn,et al. Closed-form solution of absolute orientation using unit quaternions , 1987 .

[6] H. Jorke,et al. INFITEC-A NEW STEREOSCOPIC VISUALISATION TOOL BY WAVELENGTH MULTIPLEX IMAGING , 2003 .

[7] Eric Dubois,et al. Cancellation of image crosstalk in time-sequential displays of stereoscopic video , 2000, IEEE Trans. Image Process..

[8] S. Birchfiled. A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling , 1998 .

[9] Oliver Bimber,et al. Compensating Indirect Scattering for Immersive and Semi-Immersive Projection Displays , 2006, IEEE Virtual Reality Conference (VR 2006).

[10] David Salesin,et al. Keyframe-based tracking for rotoscoping and animation , 2004, ACM Trans. Graph..

[11] Erik Reinhard,et al. Color Transfer between Images , 2001, IEEE Computer Graphics and Applications.

[12] Markus Gross,et al. Practical temporal consistency for image-based graphics applications , 2012, ACM Trans. Graph..

[13] Christophe Renaud,et al. Radiometric compensation for a low-cost immersive projection system , 2008, VRST '08.

[14] Gordon Wetzstein,et al. The visual computing of projector-camera systems , 2008, SIGGRAPH '08.

[15] Hujun Bao,et al. Simultaneous multi-body stereo and segmentation , 2011, 2011 International Conference on Computer Vision.

[16] Greg Welch,et al. Ensuring color consistency across multiple cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17] Marc Pollefeys,et al. Simplified Belief Propagation for Multiple View Reconstruction , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[18] Ed Lantz,et al. SPHERICAL IMAGE REPRESENTATION AND DISPLAY: A NEW PARADIGM FOR COMPUTER GRAPHICS , 1995 .

[19] Kurt Debattista,et al. A GPU based saliency map for high-fidelity selective rendering , 2006, AFRIGRAPH '06.

[20] S. Süsstrunk,et al. SLIC Superpixels ? , 2010 .

[21] Zhouyu Fu,et al. Efficient Graph Cuts for Multiclass Interactive Image Segmentation , 2007, ACCV.

[22] Jean-Yves Bouguet,et al. Camera calibration toolbox for matlab , 2001 .

[23] Donald P. Greenberg,et al. A perceptually based physical error metric for realistic image synthesis , 1999, SIGGRAPH.

[24] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[25] Guillermo Sapiro,et al. Video SnapCut: robust video object cutout using localized classifiers , 2009, ACM Trans. Graph..

[26] Thomas Brox,et al. Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[27] E. Reinhard. Photographic Tone Reproduction for Digital Images , 2002 .

[28] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.

[29] Robert Patterson,et al. Human factors of 3‐D displays , 2007 .

[30] Heung-Yeung Shum,et al. Paint selection , 2009, SIGGRAPH 2009.

[31] Robert S. Allison,et al. The effect of crosstalk on depth magnitude in thin structures , 2011, Electronic Imaging.

[32] Michael Gleicher,et al. Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[33] Patrick Pérez,et al. Poisson image editing , 2003, ACM Trans. Graph..

[34] Nanning Zheng,et al. Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35] Seon-Min Rhee,et al. Low-Cost Telepresence for Collaborative Virtual Environments , 2007, IEEE Transactions on Visualization and Computer Graphics.

[36] Gareth Funka-Lea,et al. Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[37] Jeffrey Lubin,et al. A VISUAL DISCRIMINATION MODEL FOR IMAGING SYSTEM DESIGN AND EVALUATION , 1995 .

[38] Hujun Bao,et al. Consistent Depth Maps Recovery from a Video Sequence , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] I. Howard,et al. Seeing in depth, Vol. 2: Depth perception. , 2002 .

[40] Hans-Peter Seidel,et al. A perceptual framework for contrast processing of high dynamic range images , 2006, TAP.

[41] Scott Cohen,et al. LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42] Emanuele Trucco,et al. Efficient stereo with multiple windowing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Alexander A. Sawchuk,et al. Disparity manipulation for stereo images and video , 2008, Electronic Imaging.

[44] Ruigang Yang,et al. Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Todor Georgiev,et al. Covariant Derivatives and Vision , 2006, ECCV.

[46] Mário A. T. Figueiredo,et al. Cosegmentation for Image Sequences , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[47] Manolis I. A. Lourakis,et al. SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[48] Zeev Farbman,et al. Coordinates for instant image cloning , 2009, ACM Trans. Graph..

[49] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[50] Diego Gutierrez,et al. Perceptual rendering of participating media , 2007, TAP.

[51] Mtm Marc Lambooij,et al. Visual Discomfort and Visual Fatigue of Stereoscopic Displays: A Review , 2009 .

[52] Sylvain Paris,et al. Edge-Preserving Smoothing and Mean-Shift Segmentation of Video Streams , 2008, ECCV.

[53] Minglun Gong. Foreground segmentation of live videos using locally competing 1SVMs , 2011, CVPR 2011.

[54] Laurie M Wilcox,et al. A reevaluation of the tolerance to vertical misalignment in stereopsis. , 2009, Journal of vision.

[55] Ruigang Yang,et al. Stereoscopic inpainting: Joint color and depth completion from stereo images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Li Guan,et al. 3D Object Reconstruction with Heterogeneous Sensor Data , 2008 .

[57] David Kim,et al. Shake'n'sense: reducing interference for overlapping structured light depth cameras , 2012, CHI.

[58] Zhengyou Zhang,et al. A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[59] Paul E. Debevec,et al. A dual light stage , 2005, EGSR '05.

[60] Bernd Fröhlich,et al. Three extensions to subtractive crosstalk reduction , 2007, EGVE'07.

[61] Hailin Jin,et al. Stereo matching with nonparametric smoothness priors in feature space , 2009, CVPR.

[62] Marc Pollefeys,et al. Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[63] Ruigang Yang,et al. Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[64] Vladimir Kolmogorov,et al. Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65] David M. Hoffman,et al. Vergence-accommodation conflicts hinder visual performance and cause visual fatigue. , 2008, Journal of vision.

[66] Gordon Wetzstein,et al. Coded aperture projection , 2008, SIGGRAPH '08.

[67] Nebojsa Jojic,et al. Consistent segmentation for optical flow estimation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[68] Daniel DeMenthon,et al. SPATIO-TEMPORAL SEGMENTATION OF VIDEO BY HIERARCHICAL MEAN SHIFT ANALYSIS , 2002 .

[69] Gerardo Toraldo,et al. On the Solution of Large Quadratic Programming Problems with Bound Constraints , 1991, SIAM J. Optim..

[70] Lili Wang,et al. Crosstalk Evaluation in Stereoscopic Displays , 2011, Journal of Display Technology.

[71] Pushmeet Kohli,et al. Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[72] Andrew Blake,et al. Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[73] Sebastian Thrun,et al. An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[74] Aggelos K. Katsaggelos,et al. Digital image restoration , 2012, IEEE Signal Process. Mag..

[75] Ruigang Yang,et al. Multi-resolution real-time stereo on commodity graphics hardware , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[76] Olga Veksler,et al. Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[77] Andreas Kolb,et al. Sub-pixel data fusion and edge-enhanced distance refinement for 2D/3D images , 2008, Int. J. Intell. Syst. Technol. Appl..

[78] Sebastian Thrun,et al. High-quality scanning using time-of-flight depth superresolution , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[79] Eric L. Miller,et al. Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[80] Donald P. Greenberg,et al. A model of visual adaptation for realistic image synthesis , 1996, SIGGRAPH.

[81] Rafal Mantiuk,et al. Display adaptive tone mapping , 2008, SIGGRAPH 2008.

[82] Murat Kunt,et al. Wavelet-based color image compression: exploiting the contrast sensitivity function , 2003, IEEE Trans. Image Process..

[83] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[84] Richard Szeliski,et al. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[85] Andrew W. Fitzgibbon,et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[86] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[87] C. Wheatstone. XVIII. Contributions to the physiology of vision. —Part the first. On some remarkable, and hitherto unobserved, phenomena of binocular vision , 1962, Philosophical Transactions of the Royal Society of London.

[88] C. Vogel. Computational Methods for Inverse Problems , 1987 .

[89] Heiko Hirschmüller,et al. Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[90] Alexander Toet,et al. Visual comfort of binocular and 3D displays , 2004 .

[91] Scott J. Daly,et al. Visible differences predictor: an algorithm for the assessment of image fidelity , 1992, Electronic Imaging.

[92] Paul J. Besl,et al. A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[93] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[94] Abraham Mammen,et al. Transparency and antialiasing algorithms implemented with the virtual pixel maps technique , 1989, IEEE Computer Graphics and Applications.

[95] Sing Bing Kang,et al. A Viewer-Centric Editor for Stereoscopic Cinema , 2010 .

[96] Pushmeet Kohli,et al. Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[97] Harry Shum,et al. Pop-up light field: An interactive image-based modeling and rendering system , 2004, TOGS.

[98] Peter-Pike J. Sloan,et al. Volumetric obscurance , 2010, I3D '10.

[99] Hujun Bao,et al. Consistent depth maps recovery from a trinocular video sequence , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[100] Stanislav V. Klimenko,et al. Crosstalk reduction in passive stereo-projection systems , 2003, Eurographics.

[101] David J. Sheskin,et al. Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[102] Aditi Majumder,et al. Perceptual photometric seamlessness in projection-based tiled displays , 2005, TOGS.

[103] Y. Yeh,et al. Limits of Fusion and Depth Judgment in Stereoscopic Color Displays , 1990, Human factors.

[104] Maureen C. Stone,et al. Color gamut mapping and the printing of digital color images , 1988, TOGS.

[105] Aljoscha Smolic,et al. Nonlinear disparity mapping for stereoscopic 3D , 2010, ACM Trans. Graph..

[106] Alexei A. Efros,et al. Photo clip art , 2007, ACM Trans. Graph..

[107] Henry Fuchs,et al. Reducing interference between multiple structured light depth sensors using motion , 2012, 2012 IEEE Virtual Reality Workshops (VRW).

[108] Zhengyou Zhang,et al. Virtual View Generation with a Hybrid Camera Array , 2009 .

[109] Alan F. Smeaton,et al. Thermo-visual feature fusion for object tracking using multiple spatiogram trackers , 2007 .

[110] Sing Bing Kang,et al. Stereo for Image-Based Rendering using Image Over-Segmentation , 2007, International Journal of Computer Vision.

[111] Michael F. Cohen,et al. Image and Video Matting: A Survey , 2007, Found. Trends Comput. Graph. Vis..

[112] Roberto Cipolla,et al. Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113] Kiriakos N. Kutulakos,et al. A theory of inverse light transport , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[114] Lszl Szirmay-Kalos,et al. Monte Carlo Methods in Global Illumination: Photo-realistic Rendering with Randomization , 2008 .

[115] Yuichi Ohta,et al. Analytical compensation of inter-reflection for pattern projection , 2006, VRST '06.

[116] James T. Kajiya,et al. The rendering equation , 1998 .

[117] Barbara Cutler,et al. Global Illumination Compensation for Spatially Augmented Reality , 2010, Comput. Graph. Forum.

[118] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[119] Ivan Laptev,et al. Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[120] Thomas Brox,et al. High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[121] David Salesin,et al. Video matting of complex scenes , 2002, SIGGRAPH.

[122] Donald P. Greenberg,et al. A model of visual masking for computer graphics , 1997, SIGGRAPH.

[123] Harry Shum,et al. To appear in the ACM SIGGRAPH conference proceedings Drag-and-Drop Pasting , 2022 .

[124] Pedro F. Felzenszwalb,et al. Efficient belief propagation for early vision , 2004, CVPR 2004.

[125] Sang Uk Lee,et al. Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[126] Rahul Nair,et al. High Accuracy TOF and Stereo Sensor Fusion at Interactive Rates , 2012, ECCV Workshops.