Ultra-Low Bitrate Video Conferencing Using Deep Image Animation

In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side. The overall system is trained in an end-to-end fashion minimizing a reconstruction error on the encoder output. Objective and subjective quality evaluation experiments demonstrate that the proposed approach provides an average bitrate reduction for the same visual quality of more than 80% compared to HEVC.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Giuseppe Valenzise,et al.  Quality Assessment of Deep-Learning-Based Image Compression , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[3]  Xinfeng Zhang,et al.  Image and Video Compression With Neural Networks: A Review , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[6]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[8]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[9]  Luc Van Gool,et al.  Generative Adversarial Networks for Extreme Learned Image Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Beatrice Pesquet-Popescu,et al.  Region-of-interest based rate control scheme for high efficiency video coding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Xin Yang,et al.  Exposing Deep Fakes Using Inconsistent Head Poses , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Françoise J. Prêteux,et al.  Critic review on MPEG-4 face and body animation , 2002, Proceedings. International Conference on Image Processing.

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Attilio Fiandrotti,et al.  Enhancing HEVC Spatial Prediction by Context-based Learning , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[17]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Michael Unser,et al.  On the optimality of operator-like wavelets for sparse AR(1) processes , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Nicu Sebe,et al.  First Order Motion Model for Image Animation , 2020, NeurIPS.

[20]  Nicu Sebe,et al.  Animating Arbitrary Objects via Deep Motion Transfer , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andreas Spanias,et al.  Speech coding: a tutorial review , 1994, Proc. IEEE.

[22]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[23]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[25]  Steve Branson,et al.  Learned Video Compression , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).