Convolutional Block Design for Learned Fractional Downsampling

The layers of convolutional neural networks (CNNs) can be used to alter the resolution of their inputs, but the scaling factors are limited to integer values. However, in many image and video processing applications, the ability to resize by a fractional factor would be advantageous. One example is conversion between resolutions standardized for video compression, such as from 1080p to 720p. To solve this problem, we propose an alternative building block, formulated as a conventional convolutional layer followed by a differentiable resizer. More concretely, the convolutional layer preserves the resolution of the input, while the resizing operation is fully handled by the resizer. In this way, any CNN architecture can be adapted for non-integer resizing. As an application, we replace the resizing convolutional layer of a modern deep downsampling model by the proposed building block, and apply it to an adaptive bitrate video streaming scenario. Our experimental results show that an improvement in coding efficiency over the conventional Lanczos algorithm is attained, in terms of PSNR, SSIM, and VMAF on test videos.

[1]  Alberto Blanc,et al.  Optimal Selection of Adaptive Streaming Representations , 2014, ACM Trans. Multim. Comput. Commun. Appl..

[2]  Jian Yang,et al.  Image Super-Resolution via Deep Recursive Residual Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alan C. Bovik,et al.  UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content , 2020, IEEE Transactions on Image Processing.

[4]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[5]  G. Bjontegaard,et al.  Calculation of Average PSNR Differences between RD-curves , 2001 .

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[8]  Michael Goesele,et al.  Rapid, detail-preserving image downscaling , 2016, ACM Trans. Graph..

[9]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Damon M. Chandler,et al.  A spatiotemporal most-apparent-distortion model for video quality assessment , 2011, 2011 18th IEEE International Conference on Image Processing.

[11]  Pascal Frossard,et al.  Complexity constrained representation selection for dynamic adaptive streaming , 2016, 2016 Visual Communications and Image Processing (VCIP).

[12]  Narendra Ahuja,et al.  Super-Resolution Using Sub-Band Self-Similarity , 2014, ACCV.

[13]  Alan C. Bovik,et al.  Speeding Up VP9 Intra Encoder With Hierarchical Deep Learning-Based Partition Prediction , 2019, IEEE Transactions on Image Processing.

[14]  Margaret H. Pinson,et al.  Temporal Video Quality Model Accounting for Variable Frame Delay Distortions , 2014, IEEE Transactions on Broadcasting.

[15]  Alan C. Bovik,et al.  BBAND INDEX: A NO-REFERENCE BANDING ARTIFACT PREDICTOR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Zhou Wang,et al.  Video quality assessment based on structural distortion measurement , 2004, Signal Process. Image Commun..

[17]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[18]  Steven C. H. Hoi,et al.  Deep Learning for Image Super-Resolution: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Markus H. Gross,et al.  Perceptually based downscaling of images , 2015, ACM Trans. Graph..

[20]  Hong Chang,et al.  Super-resolution through neighbor embedding , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Kwang In Kim,et al.  Single-Image Super-Resolution Using Sparse Regression and Natural Image Prior , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Anil C. Kokaram,et al.  Optimized Transcoding for Large Scale Adaptive Streaming Using Playback Statistics , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[24]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[25]  Zhenzhong Chen,et al.  Learned Image Downscaling for Upscaling Using Content Adaptive Resampler , 2019, IEEE Transactions on Image Processing.

[26]  Praful Gupta,et al.  SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality , 2017, IEEE Signal Processing Letters.

[27]  Ioannis Katsavounidis,et al.  Fast encoding parameter selection for convex hull video encoding , 2020, Optical Engineering + Applications.

[28]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Christopher Edwards,et al.  Adaptive Bitrate Selection: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[30]  Pieter Peers,et al.  Content-adaptive image downscaling , 2013, ACM Trans. Graph..

[31]  Houqiang Li,et al.  Learning a Convolutional Neural Network for Image Compact-Resolution , 2019, IEEE Transactions on Image Processing.

[32]  Thomas S. Huang,et al.  Deep Networks for Image Super-Resolution with Sparse Prior , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yung-Yu Chuang,et al.  Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[35]  Eirikur Agustsson,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).