Recognizing Compressed Videos: Challenges and Promises

This paper studies the effect of quality degradation, caused by lossy video compression, on video recognition. We investigate how the state of the art video enhancement restores the video quality needed for an effective video recognition. Furthermore, we study the impact of various enhancement objectives, namely pixel-level, feature-level, and adversarial, on action recognition performance. Our experiments demonstrate that the models trained on pixel-level loss perform well in terms of visual quality but they hurt the accuracy of action recognition due to over smoothing discriminative features. On the other hand, models trained on perceptual and adversarial loss types not only generate better perceptual quality but also further improve the action recognition performance.

[1]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[2]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[3]  Bo Yan,et al.  An efficient deep convolutional neural networks model for compressed image deblocking , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[5]  Tingting Wang,et al.  A Novel Deep Learning-Based Method of Improving Coding Efficiency from the Decoder-End for HEVC , 2017, 2017 Data Compression Conference (DCC).

[6]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Qing Ling,et al.  D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[10]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Zulin Wang,et al.  Multi-frame Quality Enhancement for Compressed Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[13]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bernhard Schölkopf,et al.  Spatio-Temporal Transformer Network for Video Restoration , 2018, ECCV.

[15]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[16]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Hongyang Chao,et al.  Building Dual-Domain Representations for Compression Artifacts Reduction , 2016, ECCV.

[18]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[19]  Zengfu Wang,et al.  Video Superresolution via Motion Compensation and Deep Residual Learning , 2017, IEEE Transactions on Computational Imaging.

[20]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Alberto Del Bimbo,et al.  Deep Generative Adversarial Compression Artifact Removal , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Bernhard Schölkopf,et al.  Photorealistic Video Super Resolution , 2018, ArXiv.

[25]  Matthew A. Brown,et al.  Frame-Recurrent Video Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[28]  Zulin Wang,et al.  Decoder-side HEVC quality enhancement with scalable convolutional neural network , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[29]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[30]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Tie Liu,et al.  MFQE 2.0: A New Approach for Multi-Frame Quality Enhancement on Compressed Video , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Thomas Brox,et al.  End-to-End Learning of Video Super-Resolution with Motion Compensation , 2017, GCPR.

[33]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[34]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Wei Wang,et al.  Video Super-Resolution via Bidirectional Recurrent Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).