Efficient Video Enhancement Transformer

Video Enhancement is an important computer vision task aiming at the removal of the artifacts from a lossy compressed video and the improvement of the visual properties by a photo-realistic restoration of the video contents. Decades of research produced a multitude of efficient algorithms, enabling the reduction of the memory footprint of the transferred video contents in a contiguously increasing network of video streaming services. In this work, we propose VETRAN - a low latency real-time online Video Enhancement TRANsformer based on spatial and temporal attention mechanisms. We validate our method on recent Video Enhancement NTIRE and AIM challenge benchmarks, i.e. REDS/REDS4, LDV, and IntVID. We improve over the compared state-of-the-art methods both quantitatively and qualitatively, while maintaining a low inference time.

[1]  L. Gool,et al.  VRT: A Video Restoration Transformer , 2022, IEEE Transactions on Image Processing.

[2]  Radu Timofte,et al.  Towards Flexible Blind JPEG Artifacts Removal , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  L. Gool,et al.  Video Super-Resolution Transformer , 2021, ArXiv.

[4]  Radu Timofte,et al.  NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Dan Zhu,et al.  Multi–Grid Back–Projection Networks , 2021, IEEE Journal of Selected Topics in Signal Processing.

[6]  Shaoshi Yang,et al.  A recurrent video quality enhancement framework with multi-granularity frame-fusion and frame difference based attention , 2020, Neurocomputing.

[7]  Chen Change Loy,et al.  BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tie Liu,et al.  MFQE 2.0: A New Approach for Multi-Frame Quality Enhancement on Compressed Video , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Qi Tian,et al.  Video Super-Resolution with Recurrent Structure-Detail Network , 2020, ECCV.

[10]  Mai Xu,et al.  Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video , 2020, ECCV.

[11]  Li Wang,et al.  Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement , 2020, AAAI.

[12]  L. Gool,et al.  Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Zhiwu Huang,et al.  The Vid3oC and IntVID Datasets for Video Super Resolution and Quality Mapping , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[14]  Radu Timofte,et al.  Efficient Video Super-Resolution through Recurrent Latent Space Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15]  Radu Timofte,et al.  NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Zulin Wang,et al.  Enhancing Quality for HEVC Compressed Videos , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Wei Wang,et al.  Video Super-Resolution via Bidirectional Recurrent Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Zulin Wang,et al.  Decoder-side HEVC quality enhancement with scalable convolutional neural network , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[22]  Thomas Brox,et al.  End-to-End Learning of Video Super-Resolution with Motion Compensation , 2017, GCPR.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hongyang Chao,et al.  Building Dual-Domain Representations for Compression Artifacts Reduction , 2016, ECCV.

[27]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Qing Ling,et al.  D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[30]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Sebastian Nowozin,et al.  Loss-Specific Training of Non-Parametric Image Restoration Models: A New State of the Art , 2012, ECCV.

[33]  Licheng Jiao,et al.  Image deblocking via sparse representation , 2012, Signal Process. Image Commun..

[34]  Jun Zhou,et al.  Adaptive non-local means filtering for image deblocking , 2011, 2011 4th International Congress on Image and Signal Processing.

[35]  Karen O. Egiazarian,et al.  Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images , 2007, IEEE Transactions on Image Processing.

[36]  Hong Yan,et al.  Blocking artifacts suppression in block-coded images using overcomplete wavelet representation , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[38]  Michael C. Mozer,et al.  A Discrete Probabilistic Memory Model for Discovering Dependencies in Time , 2001, ICANN.