Spatial-Angular Attention Network for Light Field Reconstruction

Typical learning-based light field reconstruction methods demand in constructing a large receptive field by deepening their networks to capture correspondences between input views. In this paper, we propose a spatial-angular attention network to perceive non-local correspondences in the light field, and reconstruct high angular resolution light field in an end-to-end manner. Motivated by the non-local attention mechanism (Wang et al., 2018; Zhang et al., 2019), a spatial-angular attention module specifically for the high-dimensional light field data is introduced to compute the response of each query pixel from all the positions on the epipolar plane, and generate an attention map that captures correspondences along the angular dimension. Then a multi-scale reconstruction structure is proposed to efficiently implement the non-local attention in the low resolution feature space, while also preserving the high frequency components in the high-resolution feature space. Extensive experiments demonstrate the superior performance of the proposed spatial-angular attention network for reconstructing sparsely-sampled light fields with Non-Lambertian effects.

[1]  Paul E. Debevec,et al.  A system for acquiring, processing, and rendering panoramic light field stills for virtual reality , 2018, ACM Trans. Graph..

[2]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[6]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[7]  Sven Wanner,et al.  Variational Light Field Analysis for Disparity Estimation and Super-Resolution , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Hans-Peter Seidel,et al.  Towards a Quality Metric for Dense Light Fields , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sam Kwong,et al.  Learning Light Field Angular Super-Resolution via a Geometry-Aware Network , 2020, AAAI.

[11]  Yulan Guo,et al.  Parallax Attention for Unsupervised Stereo Correspondence Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Robert Bregovic,et al.  Light Field Reconstruction Using Shearlet Transform , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Harry Shum,et al.  Plenoptic sampling , 2000, SIGGRAPH.

[15]  Jingyi Yu,et al.  Light Field Super-resolution via Attention-Guided Fusion of Hybrid Lenses , 2019, ACM Multimedia.

[16]  Hongdong Li,et al.  Revisiting Spatio-Angular Trade-off in Light Field Cameras and Extended Applications in Super-Resolution , 2019, IEEE Transactions on Visualization and Computer Graphics.

[17]  Yan Huang,et al.  Multi-Angular Epipolar Geometry Based Light Field Angular Reconstruction Network , 2020, IEEE Transactions on Computational Imaging.

[18]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[19]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[20]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[21]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[22]  Jitendra Malik,et al.  Depth from Combining Defocus and Correspondence Using Light-Field Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Frédo Durand,et al.  Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[24]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Michael Unser,et al.  Deep Convolutional Neural Network for Inverse Problems in Imaging , 2016, IEEE Transactions on Image Processing.

[27]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[28]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[29]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[30]  Yeong Jun Koh,et al.  Light Field Super-Resolution via Adaptive Feature Remixing , 2021, IEEE Transactions on Image Processing.

[31]  Lu Fang,et al.  CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping , 2018, ECCV.

[32]  Fei Wu,et al.  FcaNet: Frequency Channel Attention Networks , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Qionghai Dai,et al.  Learning Sheared EPI Structure for Light Field Reconstruction , 2019, IEEE Transactions on Image Processing.

[34]  Wei An,et al.  Learning Parallax Attention for Stereo Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaoming Chen,et al.  Fast Light Field Reconstruction with Deep Coarse-to-Fine Modeling of Spatial-Angular Clues , 2018, ECCV.

[36]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[37]  Christine Guillemot,et al.  Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Tieniu Tan,et al.  End-to-End View Synthesis for Light Field Imaging with Pseudo 4DCNN , 2018, ECCV.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Shiyu Chang,et al.  TransGAN: Two Transformers Can Make One Strong GAN , 2021, ArXiv.

[41]  Zhenan Sun,et al.  High-fidelity View Synthesis for Light Field Imaging With Extended Pseudo 4DCNN , 2020, IEEE Transactions on Computational Imaging.

[42]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[43]  Gabriel Eilertsen,et al.  HDR image reconstruction from a single exposure using deep CNNs , 2017, ACM Trans. Graph..

[44]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[45]  Alexei A. Efros,et al.  Occlusion-Aware Depth Estimation Using Light-Field Cameras , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Ren Ng Fourier slice photography , 2005, ACM Trans. Graph..

[47]  Chao-Tsung Huang Robust Pseudo Random Fields for Light-Field Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[49]  Jie Chen,et al.  Deep Spatial-Angular Regularization for Compressive Light Field Reconstruction over Coded Apertures , 2020, ECCV.

[50]  Edmund Y Lam,et al.  Light Field View Synthesis via Aperture Disparity and Warping Confidence Map , 2021, IEEE Transactions on Image Processing.

[51]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Qionghai Dai,et al.  Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Edmund Y. Lam,et al.  High-Dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Lina Yao,et al.  Deep Learning Based Recommender System , 2017, ACM Comput. Surv..

[56]  Tsuhan Chen,et al.  Spectral analysis for sampling image-based rendering data , 2003, IEEE Trans. Circuits Syst. Video Technol..

[57]  Chao Li,et al.  Robust depth estimation for light field via spinning parallelogram operator , 2016, Comput. Vis. Image Underst..

[58]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Ming Ouhyoung,et al.  Attention-Based View Selection Networks for Light-Field Disparity Estimation , 2020, AAAI.

[60]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[61]  In-So Kweon,et al.  Learning a Deep Convolutional Network for Light-Field Image Super-Resolution , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[62]  Gaochang Wu,et al.  Revisiting Light Field Rendering With Deep Anti-Aliasing Neural Network , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Hairong Qi,et al.  Image Super-Resolution by Neural Texture Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  A. Gotchev,et al.  CIVIT DATASETS: HORIZONTAL-PARALLAX-ONLY DENSELY-SAMPLED LIGHT-FIELDS , 2020 .

[65]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.