Light Field Video Compression and Real Time Rendering

Light field imaging is rapidly becoming an established method for generating flexible image based description of scene appearances. Compared to classical 2D imaging techniques, the angular information included in light fields enables effects such as post‐capture refocusing and the exploration of the scene from different vantage points. In this paper, we describe a novel GPU pipeline for compression and real‐time rendering of light field videos with full parallax. To achieve this, we employ a dictionary learning approach and train an ensemble of dictionaries capable of efficiently representing light field video data using highly sparse coefficient sets. A novel, key element in our representation is that we simultaneously compress both image data (pixel colors) and the auxiliary information (depth, disparity, or optical flow) required for view interpolation. During playback, the coefficients are streamed to the GPU where the light field and the auxiliary information are reconstructed using the dictionary ensemble and view interpolation is performed. In order to realize the pipeline we present several technical contributions including a denoising scheme enhancing the sparsity in the dataset which enables higher compression ratios, and a novel pruning strategy which reduces the size of the dictionary ensemble and leads to significant reductions in computational complexity during the encoding of a light field. Our approach is independent of the light field parameterization and can be used with data from any light field video capture system. To demonstrate the usefulness of our pipeline, we utilize various publicly available light field video datasets and discuss the medical application of documenting heart surgery.

[1]  Saeid Sanei,et al.  On optimization of the measurement matrix for compressive sensing , 2010, 2010 18th European Signal Processing Conference.

[2]  Bogdan Dumitrescu,et al.  Stagewise K-SVD to Design Efficient Dictionaries for Sparse Representations , 2012, IEEE Signal Processing Letters.

[3]  Paul E. Debevec,et al.  A system for acquiring, processing, and rendering panoramic light field stills for virtual reality , 2018, ACM Trans. Graph..

[4]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[5]  Narendra Ahuja,et al.  Compression of lightfield rendered images using coset codes , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[6]  G. Easley,et al.  Sparse directional image representations using the discrete shearlet transform , 2008 .

[7]  Debargha Mukherjee,et al.  The latest open-source video codec VP9 - An overview and preliminary results , 2013, 2013 Picture Coding Symposium (PCS).

[8]  Ali Payani,et al.  Learning dictionary for efficient signal compression , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Michael Elad,et al.  Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning , 2018, IEEE Transactions on Signal Processing.

[10]  Neus Sabater,et al.  Dataset and Pipeline for Multi-view Light-Field Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  George Drettakis,et al.  Silhouette‐Aware Warping for Image‐Based Rendering , 2011, Comput. Graph. Forum.

[12]  Yonina C. Eldar,et al.  Robust Recovery of Signals From a Structured Union of Subspaces , 2008, IEEE Transactions on Information Theory.

[13]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[14]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[15]  Jonas Unger,et al.  Learning based compression of surface light fields for real-time rendering of global illumination scenes , 2013, SIGGRAPH ASIA Technical Briefs.

[16]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[17]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[19]  Andrea Fusiello Image-based Rendering * , 2003 .

[20]  Ehsan Miandji,et al.  On Probability of Support Recovery for Orthogonal Matching Pursuit Using Mutual Coherence , 2017, IEEE Signal Processing Letters.

[21]  Alain Rakotomamonjy,et al.  Direct Optimization of the Dictionary Learning Problem , 2013, IEEE Transactions on Signal Processing.

[22]  Jonas Unger,et al.  Compressive Image Reconstruction in Reduced Union of Subspaces , 2015, Comput. Graph. Forum.

[23]  Jonas Unger,et al.  A Unified Framework for Compression and Compressed Sensing of Light Fields and Light Field Videos , 2019, ACM Trans. Graph..

[24]  Harry Shum,et al.  Plenoptic sampling , 2000, SIGGRAPH.

[25]  Gordon Wetzstein,et al.  Tensor displays , 2012, ACM Trans. Graph..

[26]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[27]  Qionghai Dai,et al.  Light Field Image Processing: An Overview , 2017, IEEE Journal of Selected Topics in Signal Processing.

[28]  Marcus A. Magnor,et al.  Data compression for light-field rendering , 2000, IEEE Trans. Circuits Syst. Video Technol..

[29]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[30]  Qionghai Dai,et al.  Light Field Reconstruction Using Deep Convolutional Network on EPI , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Harry Shum,et al.  Image-based rendering , 2006, Found. Trends Comput. Graph. Vis..

[32]  Yonina C. Eldar,et al.  Coherence-Based Performance Guarantees for Estimating a Sparse Vector Under Random Noise , 2009, IEEE Transactions on Signal Processing.

[33]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[34]  Kenny Mitchell,et al.  Compressed Animated Light Fields with Real-Time View-Dependent Reconstruction , 2019, IEEE Transactions on Visualization and Computer Graphics.

[35]  Xiaoming Chen,et al.  Fast Light Field Reconstruction with Deep Coarse-to-Fine Modeling of Spatial-Angular Clues , 2018, ECCV.

[36]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[37]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[38]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Bernd Girod,et al.  Distributed compression for large camera arrays , 2004, IEEE Workshop on Statistical Signal Processing, 2003.

[40]  Bernd Girod,et al.  Light field compression using disparity-compensated lifting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[42]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[43]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[44]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization with Application to Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Edward H. Adelson,et al.  Single Lens Stereo with a Plenoptic Camera , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Jonas Unger,et al.  GPU Accelerated Sparse Representation of Light Fields , 2019, VISIGRAPP.

[47]  Christine Guillemot,et al.  Depth Estimation with Occlusion Handling from a Sparse Set of Light Field Views , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[48]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[49]  Gordon Wetzstein,et al.  On Plenoptic Multiplexing and Reconstruction , 2012, International Journal of Computer Vision.

[50]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .