Region of Interest Encoding in Video Conference Systems

In this paper, we present a region of interest encoding system for video conference applications. We will utilize the fact that the main focus in a typical video conference lies upon the participating persons in order to save bit-rate in less interesting parts of the video. A Viola-Jones face detector will be used to detect the regions of interest. Once a region of interest has been detected it will get tracked across consecutive frames. In order to represent the detected region of interests we use a quality map on the level of macro-blocks. This map allows the encoder to choose its quantization parameter individual for each macro-block. Furthermore, we propose a scene composition concept that is merely based upon the detected regions of interest. The visual quantization artifacts introduced by the encoder thus get irrelevant. Experiments on recorded conference sequences demonstrate the bitrate savings that can be achieved with the proposed system. Keywords-region of interest coding; object detection; object tracking; scene composition; video-conferencing

[1]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[2]  Christopher Bulla,et al.  Performance Evaluation of Object Representations in Mean Shift Tracking , 2013, MMEDIA 2013.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[4]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[5]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  K. Dawson-Howe,et al.  Evaluation of Multi-part Models for Mean-Shift Tracking , 2008, 2008 International Machine Vision and Image Processing Conference.

[8]  Pedro A. Amado Assunção,et al.  H.264/SVC ROI Encoding with Spatial Scalability , 2008, SIGMAP.