Human-Machine Collaborative Video Coding Through Cuboidal Partitioning

Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs 7% less in bit rate if the captured frames are need be communicated to a receiver.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[4]  Manoranjan Paul,et al.  Leveraging Cuboids for Better Motion Modeling in High Efficiency Video Coding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Ling-Yu Duan,et al.  Towards Coding For Human And Machine Vision: A Scalable Image Coding Approach , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[6]  Miriam Bellver,et al.  Hierarchical Object Detection with Deep Reinforcement Learning , 2016, NIPS 2016.

[7]  Dorin Comaniciu,et al.  Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scans , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Manoranjan Paul,et al.  Dynamic Point Cloud Geometry Compression using Cuboid based Commonality Modeling Framework , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[9]  Tony Lindeberg,et al.  Scale-Space for Discrete Signals , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Tareq Abed Mohammed,et al.  Understanding of a convolutional neural network , 2017, 2017 International Conference on Engineering and Technology (ICET).

[11]  Wen Gao,et al.  Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics , 2020, IEEE Transactions on Image Processing.

[12]  Guojun Lu,et al.  Enhanced Colour Image Retrieval with Cuboid Segmentation , 2018, 2018 Digital Image Computing: Techniques and Applications (DICTA).

[13]  Manzur Murshed,et al.  A Coarse Representation of Frames Oriented Video Coding By Leveraging Cuboidal Partitioning of Image Data , 2020, 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP).

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  David S. Taubman,et al.  Dynamic Point Cloud Compression Using A Cuboid Oriented Discrete Cosine Based Motion Model , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Manoranjan Paul,et al.  Depth Sequence Coding with Hierarchical Partitioning and Spatial-domain Quantisation , 2018, ArXiv.

[18]  Ling-Yu Duan,et al.  Compact Descriptors for Video Analysis: The Emerging MPEG Standard , 2017, IEEE MultiMedia.

[19]  Guojun Lu,et al.  Cuboid Segmentation for Effective Image Retrieval , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).