Collaborative Scalable Visual Compression for Human-Centered Videos

Machine intelligence systems have been increasingly widely deployed in real-world circumstances, while the conventional human-vision oriented video coding schemes are inefficient to be embedded in large-scale systems and further support a wide range of applications. There have been urgent demands for a new generation of compression framework to efficiently encodes visual data, where the compression and analytics for machine vision and human perception can be jointly optimized. To this end, we propose a novel visual compression framework to provide visual contents with different granularity for both human and machine vision tasks collaboratively. The proposed scalable compression framework maintains the critical semantic information in a basic layer, so that it is capable of supporting the accurate machine vision analysis under a tight bit-rate constraint. It is scalable to provide visual representations of different granularity to support various kinds of tasks, including video reconstruction that serves human vision examination. Experimental results on the human-centered videos have demonstrated the promising functionality of scalable visual coding with improved efficiency for high-performance machine analysis and human perception.

[1]  Ling-Yu Duan,et al.  Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics , 2021, ArXiv.

[2]  Ling-Yu Duan,et al.  Towards Coding for Human and Machine Vision: Scalable Face Image Coding , 2021, IEEE Transactions on Multimedia.

[3]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jiaying Liu,et al.  A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[5]  Ling-yu Duan,et al.  Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics , 2020, IEEE Transactions on Image Processing.

[6]  Dahua Lin,et al.  Convolutional Sequence Generation for Skeleton-Based Action Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Xiaoyun Zhang,et al.  DVC: An End-To-End Deep Video Compression Framework , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Qiang Ji,et al.  Facial Landmark Detection: A Literature Survey , 2018, International Journal of Computer Vision.

[10]  Feng Wu,et al.  Learning for Video Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Liang Lin,et al.  Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[14]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[18]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, SPIE Optics + Photonics.