Ada-VSR: Adaptive Video Super-Resolution with Meta-Learning

Most of the existing works in supervised spatio-temporal video super-resolution (STVSR) heavily rely on a large-scale external dataset consisting of paired low-resolution low-frame rate (LR-LFR) and high-resolution high-frame-rate (HR-HFR) videos. Despite their remarkable performance, these methods make a prior assumption that the low-resolution video is obtained by down-scaling the high-resolution video using a known degradation kernel, which does not hold in practical settings. Another problem with these methods is that they cannot exploit instance-specific internal information of a video at testing time. Recently, deep internal learning approaches have gained attention due to their ability to utilize the instance-specific statistics of a video. However, these methods have a large inference time as they require thousands of gradient updates to learn the intrinsic structure of the data. In this work, we present Adaptive VideoSuper-Resolution (Ada-VSR) which leverages external, as well as internal, information through meta-transfer learning and internal learning, respectively. Specifically, meta-learning is employed to obtain adaptive parameters, using a large-scale external dataset, that can adapt quickly to the novel condition (degradation model) of the given test video during the internal learning task, thereby exploiting external and internal information of a video for super-resolution. The model trained using our approach can quickly adapt to a specific video condition with only a few gradient updates, which reduces the inference time significantly. Extensive experiments on standard datasets demonstrate that our method performs favorably against various state-of-the-art approaches.

[1]  Deqing Sun,et al.  Blind Image Deblurring Using Dark Channel Prior , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Michal Irani,et al.  Nonparametric Blind Super-resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[5]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[6]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michael Elad,et al.  Fast and robust multiframe super resolution , 2004, IEEE Transactions on Image Processing.

[8]  Amit K. Roy-Chowdhury,et al.  ALANET: Adaptive Latent Attention Network for Joint Video Deblurring and Interpolation , 2020, ACM Multimedia.

[9]  Yu-Ting Su,et al.  Spatio-Temporal Mitosis Detection in Time-Lapse Phase-Contrast Microscopy Image Sequences: A Benchmark , 2021, IEEE Transactions on Medical Imaging.

[10]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[12]  P. Belhumeur,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, SIGGRAPH 2009.

[13]  Amit K. Roy-Chowdhury,et al.  Non-Adversarial Video Synthesis with Learned Priors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[15]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[18]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[19]  Kyoung Mu Lee,et al.  DynaVSR: Dynamic Adaptive Blind Video Super-Resolution , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  C Di Natale,et al.  The influence of spatial and temporal resolutions on the analysis of cell-cell interaction: a systematic study for time-lapse microscopy applications , 2019, Scientific Reports.

[21]  Gianmarco Ferri,et al.  Time-lapse confocal imaging datasets to assess structural and dynamic properties of subcellular nanostructures , 2018, Scientific data.

[22]  Michal Irani,et al.  Blind Super-Resolution Kernel Estimation using an Internal-GAN , 2019, NeurIPS.

[23]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xinbo Gao,et al.  Lightweight Image Super-Resolution with Information Multi-distillation Network , 2019, ACM Multimedia.

[25]  Gregory Shakhnarovich,et al.  Deep Back-Projection Networks for Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[27]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[28]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[29]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[30]  Jan P. Allebach,et al.  Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[32]  Jan Kautz,et al.  Loss Functions for Neural Networks for Image Processing , 2015, ArXiv.

[33]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  G. Venugopala Reddy,et al.  Deep Quantized Representation For Enhanced Reconstruction , 2020, 2020 IEEE 17th International Symposium on Biomedical Imaging Workshops (ISBI Workshops).

[36]  Soo Ye Kim,et al.  FISR: Deep Joint Frame Interpolation and Super-Resolution with A Multi-scale Temporal Loss , 2020, AAAI.

[37]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[39]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[40]  Andrew Blake,et al.  Motion Deblurring and Super-resolution from an Image Sequence , 1996, ECCV.

[41]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[42]  Li Chen,et al.  Blurry Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wojciech Matusik,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[44]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Wangmeng Zuo,et al.  Learning a Single Convolutional Super-Resolution Network for Multiple Degradations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Stephen C. Cain,et al.  Projection-based image registration in the presence of fixed-pattern noise , 2001, IEEE Trans. Image Process..

[47]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[48]  Zhiwei Xiong,et al.  Space-Time Video Super-Resolution Using Temporal Profiles , 2020, ACM Multimedia.

[49]  Gordon J. Berman,et al.  Measuring behavior across scales , 2017, BMC Biology.

[50]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Renjie Liao,et al.  Video Super-Resolution via Deep Draft-Ensemble Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Michal Irani,et al.  Space-time super-resolution from a single video , 2011, CVPR 2011.

[53]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[54]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[57]  Deqing Sun,et al.  A Bayesian approach to adaptive video super resolution , 2011, CVPR 2011.

[58]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[59]  Nam Ik Cho,et al.  Meta-Transfer Learning for Zero-Shot Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Mohammed Ghanbari,et al.  Scope of validity of PSNR in image/video quality assessment , 2008 .

[61]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[62]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[63]  Wangmeng Zuo,et al.  Blind Super-Resolution With Iterative Kernel Correction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Soo Ye Kim,et al.  Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[67]  Seoung Wug Oh,et al.  Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.