论文信息 - Steered Mixture-of-Experts for Light Field Images and Video: Representation and Coding

Steered Mixture-of-Experts for Light Field Images and Video: Representation and Coding

Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution.

[1] Thomas Sikora,et al. Regularized Gradient Descent Training of Steered Mixture of Experts for Sparse Image Representation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[2] Yun Li,et al. Efficient intra prediction scheme for light field image compression , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Thomas Sikora,et al. An MSE Approach For Training And Coding Steered Mixtures Of Experts , 2018, 2018 Picture Coding Symposium (PCS).

[4] Zhan Yu,et al. Lytro camera technology: theory, algorithms, performance analysis , 2013, Electronic Imaging.

[5] Waqas Ahmad,et al. Interpreting plenoptic images as multi-view sequences for improved compression , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[6] Ivo Ihrke,et al. Principles of Light Field Imaging: Briefly revisiting 25 years of research , 2016, IEEE Signal Processing Magazine.

[7] Luís Ducla Soares,et al. Light Field Coding With Field-of-View Scalability and Exemplar-Based Interlayer Prediction , 2018, IEEE Transactions on Multimedia.

[8] Vanessa Testoni,et al. A 4D DCT-Based Lenslet Light Field Codec , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[9] Pablo Carballeira,et al. Toward the realization of six degrees-of-freedom with compressed light fields , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[10] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[11] Gauthier Lafruit,et al. Robust Multiview Synthesis for Wide-Baseline Camera Arrays , 2018, IEEE Transactions on Multimedia.

[12] Thomas Sikora,et al. Video representation and coding using a sparse steered mixture-of-experts network , 2016, 2016 Picture Coding Symposium (PCS).

[13] Glenn Van Wallendael,et al. Highly parallel steered mixture-of-experts rendering at pixel-level for image and light field data , 2018, Journal of Real-Time Image Processing.

[14] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[15] Joseph N. Wilson,et al. Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[16] Marc Levoy,et al. Light field rendering , 1996, SIGGRAPH.

[17] Catarina Brites,et al. Holographic Data Coding: Benchmarking and Extending HEVC With Adapted Transforms , 2018, IEEE Transactions on Multimedia.

[18] M. Vetterli,et al. Approximation and compression of piecewise smooth functions , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[19] Guido Bugmann,et al. Normalized Gaussian Radial Basis Function networks , 1998, Neurocomputing.

[20] Ying Chen,et al. Overview of the Multiview and 3D Extensions of High Efficiency Video Coding , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[21] Alexei A. Efros,et al. Light field video capture using a learning-based hybrid imaging system , 2017, ACM Trans. Graph..

[22] Krzysztof Wegner,et al. Immersive visual media — MPEG-I: 360 video, virtual navigation and beyond , 2017, 2017 International Conference on Systems, Signals and Image Processing (IWSSIP).

[23] Peter Lambert,et al. Steered mixture-of-experts for light field coding, depth estimation, and processing , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[24] Luís Ducla Soares,et al. Locally linear embedding-based prediction for 3D holoscopic image coding using HEVC , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[25] Touradj Ebrahimi,et al. Objective and subjective evaluation of light field image compression algorithms , 2016, 2016 Picture Coding Symposium (PCS).

[26] Dan Klein,et al. Online EM for Unsupervised Models , 2009, NAACL.

[27] Touradj Ebrahimi,et al. Comparison and Evaluation of Light Field Image Coding Approaches , 2017, IEEE Journal of Selected Topics in Signal Processing.

[28] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[29] Cristian Perra,et al. High efficiency coding of light field images based on tiling and pseudo-temporal data arrangement , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[30] Peter Lambert,et al. A universal image coding approach using sparse steered Mixture-of-Experts regression , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[31] Bin Li,et al. Pseudo-Sequence-Based 2-D Hierarchical Coding Structure for Light-Field Image Compression , 2016, IEEE Journal of Selected Topics in Signal Processing.

[32] D. Mumford,et al. Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .

[33] Touradj Ebrahimi,et al. JPEG Pleno: Toward an Efficient Representation of Visual Reality , 2016, IEEE MultiMedia.

[34] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35] Michael I. Jordan,et al. Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[36] Neus Sabater,et al. Superrays for Efficient Light Field Processing , 2017, IEEE Journal of Selected Topics in Signal Processing.

[37] Touradj Ebrahimi,et al. Quality Assessment Of Compression Solutions for Icip 2017 Grand Challenge on Light Field Image Coding , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[38] Touradj Ebrahimi,et al. New Light Field Image Dataset , 2016, QoMEX 2016.

[39] Bastian Goldlücke,et al. Occlusion-Aware Depth Estimation Using Sparse Light Field Coding , 2016, GCPR.

[40] H. Sung. Gaussian Mixture Regression and Classification , 2004 .

[41] Charles Elkan,et al. Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[42] Jie Chen,et al. Light Field Compression With Disparity-Guided Sparse Coding Based on Structural Key Views. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[43] Robert Bregovic,et al. Light Field Reconstruction Using Shearlet Transform , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] E. Adelson,et al. The Plenoptic Function and the Elements of Early Vision , 1991 .

[45] Peyman Milanfar,et al. Kernel Regression for Image Processing and Reconstruction , 2007, IEEE Transactions on Image Processing.

[46] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.

[47] D. Lalush,et al. Block-iterative techniques for fast 4D reconstruction using a priori motion models in gated cardiac SPECT. , 1998, Physics in medicine and biology.

[48] Michael I. Jordan,et al. Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[49] Bastian Goldlücke,et al. What Sparse Light Field Coding Reveals about Scene Structure , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[51] Peter Lambert,et al. Lossy image coding in the pixel domain using a sparse steering kernel synthesis approach , 2014, 2014 IEEE International Conference on Image Processing (ICIP).