Cross Parallax Attention Network for Stereo Image Super-Resolution

Stereo super-resolution (SR) aims to enhance the spatial resolution of one camera view using additional information from the other. Previous deep-learning-based stereo SR methods indeed improved the SR performance effectively by employing additional information, but they are unable to super-resolve stereo images where there are large disparities, or different types of epipolar lines. Moreover, in these methods, one model can only super-solve images of a particular view, and for one specific scale factor. This paper proposes a cross parallax attention stereo super-resolution network (CPASSRnet) which can perform stereo SR of multiple scale factors for both views, with a single model. To overcome the difficulties of large disparity and different types of epipolar lines, a cross parallax attention module (CPAM) is presented, which captures the global correspondence of additional information for each view, relative to the other. CPAM allows the two views to exchange additional information with each other according to the generated attention maps. Quantitative and qualitative results compared with the state of the arts illustrate the superiority of CPASSRnet. Ablation experiments demonstrate that the proposed components are effective and noise tests verify the robustness of CPASSRnet.

[1]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhiwei Xiong,et al.  Robust Web Image/Video Super-Resolution , 2010, IEEE Transactions on Image Processing.

[3]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shaojie Shen,et al.  Stereo R-CNN Based 3D Object Detection for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[6]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  S. Biswas,et al.  Image Super-resolution , 2011 .

[9]  Zhaoyang Lu,et al.  Joint Deep and Depth for Object-Level Segmentation and Stereo Tracking in Crowds , 2019, IEEE Transactions on Multimedia.

[10]  Leon Hirsch,et al.  Super Resolution From A Single Image , 2016 .

[11]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Wangmeng Zuo,et al.  DAVANet: Stereo Deblurring With View Aggregation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Nicu Sebe,et al.  Spatio-Temporal Attention Networks for Action Recognition and Detection , 2020, IEEE Transactions on Multimedia.

[14]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yu Qiao,et al.  Attention-Guided Hierarchical Structure Aggregation for Image Matting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Song Guo,et al.  Dual-view Attention Networks for Single Image Super-Resolution , 2020, ACM Multimedia.

[17]  Kangfu Mei,et al.  Multi-scale Residual Network for Image Super-Resolution , 2018, ECCV.

[18]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Yu-Chiang Frank Wang,et al.  A Self-Learning Approach to Single Image Super-Resolution , 2013, IEEE Transactions on Multimedia.

[21]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[23]  Weimin Tan,et al.  Disparity-Aware Domain Adaptation in Stereo Image Restoration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[25]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kwanghoon Sohn,et al.  Stereoscopic Image Super-Resolution with Stereo Consistent Feature , 2020, AAAI.

[28]  Seung-Hwan Baek,et al.  Enhancing the Spatial Resolution of Stereo Images Using a Parallax Prior , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Qiang Wu,et al.  A Computational Model for Stereoscopic Visual Saliency Prediction , 2019, IEEE Transactions on Multimedia.

[31]  Kyung-Ah Sohn,et al.  Fast, Accurate, and, Lightweight Super-Resolution with Cascading Residual Network , 2018, ECCV.

[32]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  KimKwang In,et al.  Single-Image Super-Resolution Using Sparse Regression and Natural Image Prior , 2010 .

[34]  Hong Chang,et al.  Super-resolution through neighbor embedding , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[35]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[38]  Sang Uk Lee,et al.  Combining multi-view stereo and super resolution in a unified framework , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[39]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[40]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[41]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Zhiliang Zhu,et al.  Fast Single Image Super-Resolution via Self-Example Learning and Sparse Representation , 2014, IEEE Transactions on Multimedia.

[43]  Zheng Zhang,et al.  Disentangled Non-Local Neural Networks , 2020, ECCV.