论文信息 - Efficient Video Enhancement Transformer

Efficient Video Enhancement Transformer

Video Enhancement is an important computer vision task aiming at the removal of the artifacts from a lossy compressed video and the improvement of the visual properties by a photo-realistic restoration of the video contents. Decades of research produced a multitude of efficient algorithms, enabling the reduction of the memory footprint of the transferred video contents in a contiguously increasing network of video streaming services. In this work, we propose VETRAN - a low latency real-time online Video Enhancement TRANsformer based on spatial and temporal attention mechanisms. We validate our method on recent Video Enhancement NTIRE and AIM challenge benchmarks, i.e. REDS/REDS4, LDV, and IntVID. We improve over the compared state-of-the-art methods both quantitatively and qualitatively, while maintaining a low inference time.

R. Timofte | Florin-Alexandru Vasluianu

[1] L. Gool,et al. VRT: A Video Restoration Transformer , 2022, IEEE Transactions on Image Processing.

[2] Radu Timofte,et al. Towards Flexible Blind JPEG Artifacts Removal , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] L. Gool,et al. Video Super-Resolution Transformer , 2021, ArXiv.

[4] Radu Timofte,et al. NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5] Dan Zhu,et al. Multi–Grid Back–Projection Networks , 2021, IEEE Journal of Selected Topics in Signal Processing.

[6] Shaoshi Yang,et al. A recurrent video quality enhancement framework with multi-granularity frame-fusion and frame difference based attention , 2020, Neurocomputing.

[7] Chen Change Loy,et al. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Tie Liu,et al. MFQE 2.0: A New Approach for Multi-Frame Quality Enhancement on Compressed Video , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Qi Tian,et al. Video Super-Resolution with Recurrent Structure-Detail Network , 2020, ECCV.

[10] Mai Xu,et al. Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video , 2020, ECCV.

[11] Li Wang,et al. Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement , 2020, AAAI.

[12] L. Gool,et al. Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Zhiwu Huang,et al. The Vid3oC and IntVID Datasets for Video Super Resolution and Quality Mapping , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[14] Radu Timofte,et al. Efficient Video Super-Resolution through Recurrent Latent Space Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15] Radu Timofte,et al. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16] Chen Change Loy,et al. EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17] Zulin Wang,et al. Enhancing Quality for HEVC Compressed Videos , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[18] Wei Wang,et al. Video Super-Resolution via Bidirectional Recurrent Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Jian Yang,et al. MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Zulin Wang,et al. Decoder-side HEVC quality enhancement with scalable convolutional neural network , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[22] Thomas Brox,et al. End-to-End Learning of Video Super-Resolution with Motion Compensation , 2017, GCPR.

[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[24] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Christian Ledig,et al. Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Hongyang Chao,et al. Building Dual-Domain Representations for Compression Artifacts Reduction , 2016, ECCV.

[27] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Qing Ling,et al. D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.

[30] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32] Sebastian Nowozin,et al. Loss-Specific Training of Non-Parametric Image Restoration Models: A New State of the Art , 2012, ECCV.

[33] Licheng Jiao,et al. Image deblocking via sparse representation , 2012, Signal Process. Image Commun..

[34] Jun Zhou,et al. Adaptive non-local means filtering for image deblocking , 2011, 2011 4th International Congress on Image and Signal Processing.

[35] Karen O. Egiazarian,et al. Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images , 2007, IEEE Transactions on Image Processing.

[36] Hong Yan,et al. Blocking artifacts suppression in block-coded images using overcomplete wavelet representation , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[37] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[38] Michael C. Mozer,et al. A Discrete Probabilistic Memory Model for Discovering Dependencies in Time , 2001, ICANN.