Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report

Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs. The participants were provided with the REDS dataset and trained their models to do an efficient 4X video upscaling. The runtime of all models was evaluated on the OPPO Find X2 smartphone with the Snapdragon 865 SoC capable of accelerating floating-point networks on its Adreno GPU. The proposed solutions are fully compatible with any mobile GPU and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.

[1]  Xu Jia,et al.  Revisiting Temporal Modeling for Video Super-resolution , 2020, BMVC.

[2]  Radu Timofte,et al.  Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Juhyun Lee,et al.  On-Device Neural Net Inference with Mobile GPUs , 2019, ArXiv.

[5]  Matthew A. Brown,et al.  Frame-Recurrent Video Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Radu Timofte,et al.  Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Radu Timofte,et al.  Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Hanseok Ko,et al.  NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Fabien Cardinaux,et al.  Mixed Precision DNNs: All you need is a good parametrization , 2019, ICLR.

[10]  Radu Timofte,et al.  AIM 2019 Challenge on Bokeh Effect Synthesis: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[11]  Dong-Wook Kim,et al.  NTIRE 2019 Challenge on Real Image Denoising: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Koan-Sin Tan,et al.  Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Luc Van Gool,et al.  Replacing Mobile Camera ISP with a Single Deep Learning Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Kurt Keutzer,et al.  ZeroQ: A Novel Zero Shot Quantization Framework , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Radu Timofte,et al.  AIM 2020 Challenge on Learned Image Signal Processing Pipeline , 2020, ECCV Workshops.

[16]  Yuandong Tian,et al.  FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Chen Hong,et al.  NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Ning Wang,et al.  EVSRNet: Efficient Video Super-Resolution with Neural Architecture Search , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Albert Gural,et al.  Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.

[22]  L. Gool,et al.  T-Basis: a Compact Representation for Neural Networks , 2020, ICML.

[23]  Luc Van Gool,et al.  WESPE: Weakly Supervised Photo Enhancer for Digital Cameras , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Houqiang Li,et al.  Quantization Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaoling Zhang,et al.  NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Jie Li,et al.  AIM 2019 Challenge on RAW to RGB Mapping: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28]  Xinbo Gao,et al.  Lightweight Image Super-Resolution with Information Multi-distillation Network , 2019, ACM Multimedia.

[29]  Radu Timofte,et al.  AIM 2020 Challenge on Rendering Realistic Bokeh , 2020, ECCV Workshops.

[30]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Radu Timofte,et al.  Rendering Natural Camera Bokeh Effect with Deep Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Radu Timofte,et al.  NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jae-Seok Choi,et al.  A Deep Convolutional Neural Network with Selection Units for Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Fahad Shahbaz Khan,et al.  NTIRE 2019 Challenge on Image Enhancement: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luc Van Gool,et al.  PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report , 2018, ECCV Workshops.

[38]  Ke Wang,et al.  AI Benchmark: Running Deep Neural Networks on Android Smartphones , 2018, ECCV Workshops.

[39]  Radu Timofte,et al.  NTIRE 2020 Challenge on Image and Video Deblurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Radu Timofte,et al.  Efficient Video Super-Resolution through Recurrent Latent Space Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[42]  Bohyung Han,et al.  Fine-Grained Neural Architecture Search , 2019, ArXiv.

[43]  Chen Change Loy,et al.  BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Luc Van Gool,et al.  DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Luc Van Gool,et al.  NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[47]  Radu Timofte,et al.  Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[49]  Andrey Ignatov,et al.  Controlling Information Capacity of Binary Neural Network , 2020, Pattern Recognit. Lett..

[50]  Radu Timofte,et al.  Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  A. Murat Tekalp,et al.  NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[52]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Luc Van Gool,et al.  AI Benchmark: All About Deep Learning on Smartphones in 2019 , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[55]  Jungong Han,et al.  ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).