Spatiotemporal Symmetric Convolutional Neural Network for Video Bit-Depth Enhancement

In contrast to the high sensitivity of human eyes and rapid development of modern display devices in terms of dynamic range, mainstream multimedia sources are generally at relatively lower bit depths (BDs). Therefore, BD enhancement (BDE), which attempts to transform low-BD multimedia sources into high-BD sources, is considered of significant research value. Current BDE algorithms are based on images rather than videos. However, for massive numbers of videos, temporal continuity among frames should be considered. Thus, in this paper, we propose a spatiotemporal symmetric BDE network for videos based on an encoder–decoder network. Consecutive frames are input into five subnets in the encoder, where the convolutional filters in the temporal symmetric subnets share the same weights to achieve lower model complexity. In addition, symmetric skip connections are introduced between the symmetric convolutional/deconvolutional layers of the encoder/decoder to pass features and alleviate the gradient diffusion problem. The experimental results show that our model can efficiently eliminate false contours and chroma distortions. The model significantly outperforms state-of-the-art image BDE algorithms and single-frame baseline models in terms of PSNR and SSIM.

[1]  Renjie Liao,et al.  Detail-Revealing Deep Video Super-Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Yutao Liu,et al.  Bit-Depth Enhancement via Convolutional Neural Network , 2017, IFTC.

[4]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[5]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  Pao-Chi Chang,et al.  Short/long-term motion vector prediction in multi-frame video coding system , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[8]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[9]  Sergiu Nedevschi,et al.  Total variation regularization of local-global optical flow , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[10]  Xiaoshuai Sun,et al.  Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length , 2018, IEEE Transactions on Multimedia.

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Gabriel Eilertsen,et al.  HDR image reconstruction from a single exposure using deep CNNs , 2017, ACM Trans. Graph..

[13]  Guangtao Zhai,et al.  IPAD: Intensity potential for adaptive de-quantization , 2017, ICME.

[14]  Manuel Menezes de Oliveira Neto,et al.  High-Quality Reverse Tone Mapping for a Wide Range of Exposures , 2014, 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images.

[15]  Oscar C. Au,et al.  Image Bit-Depth Enhancement via Maximum A Posteriori Estimation of AC Signal , 2016, IEEE Transactions on Image Processing.

[16]  Weiyao Lin,et al.  Picking Neural Activations for Fine-Grained Recognition , 2017, IEEE Transactions on Multimedia.

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Renjie Liao,et al.  Video Super-Resolution via Deep Draft-Ensemble Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Oscar C. Au,et al.  Bit-depth expansion by contour region reconstruction , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Sumei Li,et al.  Shallow and deep convolutional networks for image super-resolution , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tao Mei,et al.  Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation , 2018, IEEE Transactions on Multimedia.

[24]  Xianming Liu,et al.  Learning Temporal Dynamics for Video Super-Resolution: A Deep Learning Approach , 2018, IEEE Transactions on Image Processing.

[25]  Lu Fang,et al.  From 2D Extrapolation to 1D Interpolation: Content Adaptive Image Bit-Depth Expansion , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[26]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Oscar C. Au,et al.  Bit-depth expansion using Minimum Risk Based Classification , 2012, 2012 Visual Communications and Image Processing.

[28]  Chao Ren,et al.  Video Super-Resolution via Residual Learning , 2018, IEEE Access.

[29]  Qi Wu,et al.  Multilabel Image Classification With Regional Latent Semantic Dependencies , 2016, IEEE Transactions on Multimedia.