Perceptual Video Compression: A Survey

With the advances in understanding perceptual properties of the human visual system and constructing their computational models, efforts toward incorporating human perceptual mechanisms in video compression to achieve maximal perceptual quality have received great attention. This paper thoroughly reviews the recent advances of perceptual video compression mainly in terms of the three major components, namely, perceptual model definition, implementation of coding, and performance evaluation. Furthermore, open research issues and challenges are discussed in order to provide perspectives for future research trends.

[1]  Jack Y. B. Lee On a unified architecture for video-on-demand services , 2002, IEEE Trans. Multim..

[2]  Laurent Itti,et al.  The role of memory in guiding attention during natural vision. , 2006, Journal of vision.

[3]  Marios S. Pattichis,et al.  Foveated video compression with optimal rate control , 2001, IEEE Trans. Image Process..

[4]  Jordi Ribas-Corbera,et al.  As plain as the noise on your face: Adaptive video compression using face detection and visual eccentricity models , 2001, J. Electronic Imaging.

[5]  Baoxin Li,et al.  Power-aware content-adaptive H.264 video encoding , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[7]  Fan Zhang,et al.  A Parametric Framework for Video Compression Using Region-Based Texture Models , 2011, IEEE Journal of Selected Topics in Signal Processing.

[8]  Wen-Huang Cheng,et al.  A practical foveation-based rate-shaping mechanism for MPEG videos , 2005, IEEE Trans. Circuits Syst. Video Technol..

[9]  Constantinos S. Pattichis,et al.  An Embedded Saliency Map Estimator Scheme: Application to Video Encoding , 2007, Int. J. Neural Syst..

[10]  Chia-Hung Yeh,et al.  Robust Region-of-Interest Determination Based on User Attention Model Through Visual Rhythm Analysis , 2007, 2007 16th International Conference on Computer Communications and Networks.

[11]  Jianfei Cai,et al.  Three Dimensional Scalable Video Adaptation via User-End Perceptual Quality Assessment , 2008, IEEE Transactions on Broadcasting.

[12]  Oleg V. Komogortsev,et al.  Predictive real-time perceptual compression based on eye-gaze-position analysis , 2008, TOMCCAP.

[13]  Wa James Tam,et al.  Static and dynamic spatial resolution in image coding: an investigation of eye movements , 1991, Electronic Imaging.

[14]  ITU-T Rec. P.910 (04/2008) Subjective video quality assessment methods for multimedia applications , 2009 .

[15]  Barry G. Haskell,et al.  An encoder-decoder texture replacement method with application to content-based movie coding , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  D. H. Kelly Motion and vision. II. Stabilized spatio-temporal threshold surface. , 1979, Journal of the Optical Society of America.

[17]  Chun-Hsien Chou,et al.  A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile , 1995, IEEE Trans. Circuits Syst. Video Technol..

[18]  Lin Tong,et al.  Region-of-interest based rate control for low-bit-rate video conferencing , 2006, J. Electronic Imaging.

[19]  King Ngi Ngan,et al.  Face segmentation using skin-color map in videophone applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[20]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[21]  King Ngi Ngan,et al.  Perceptual video coding: Challenges and approaches , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[22]  Sheila S. Hemami,et al.  No-reference image and video quality estimation: Applications and human-motivated design , 2010, Signal Process. Image Commun..

[23]  Xiaoyan Sun,et al.  Video coding with spatio-temporal texture synthesis and edge-based inpainting , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Debin Zhao,et al.  A ROI quality adjustable rate control scheme for low bitrate video coding , 2009, 2009 Picture Coding Symposium.

[25]  Stefan Winkler,et al.  The Evolution of Video Quality Measurement: From PSNR to Hybrid Metrics , 2008, IEEE Transactions on Broadcasting.

[26]  Christine Guillemot,et al.  Perceptually-Friendly H.264/AVC Video Coding Based on Foveated Just-Noticeable-Distortion Model , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Ahmet M. Kondoz,et al.  Quality-Driven Coding and Prioritization of 3D Video over Wireless Networks , 2010 .

[28]  Yu Sun,et al.  Region-based rate control and bit allocation for wireless video transmission , 2006, IEEE Transactions on Multimedia.

[29]  Weisi Lin,et al.  Rate control for videophone using local perceptual cues , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  C. Umilta,et al.  How automatic are audiovisual links in exogenous spatial attention? , 2007, Neuropsychologia.

[31]  Eli Peli,et al.  Where people look when watching movies: Do all viewers look at the same place? , 2007, Comput. Biol. Medicine.

[32]  Xiaodong Cai,et al.  Object-based video coding with dynamic quality control , 2010, Image Vis. Comput..

[33]  Zhengguo Li,et al.  Region-of-Interest Based Resource Allocation for Conversational Video Communication of H.264/AVC , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Paolo Napoletano,et al.  Bayesian Integration of Face and Low-Level Cues for Foveated Video Coding , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Heiko Schwarz,et al.  Improved H.264/AVC coding using texture analysis and synthesis , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[36]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[37]  Jean-Marc Odobez,et al.  A ROI approach for hybrid image sequence coding , 1994, Proceedings of 1st International Conference on Image Processing.

[38]  Wenjun Zhang,et al.  Application of scalable visual sensitivity profile in image and video coding , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[39]  Alan C. Bovik,et al.  Fast algorithms for foveated video processing , 2003, IEEE Trans. Circuits Syst. Video Technol..

[40]  N. Tsapatsoulis,et al.  Region of Interest Video Coding for Low bit-rate Transmission of Carotid Ultrasound Videos over 3G Wireless Networks , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[41]  Heiko Schwarz,et al.  Overview of the Scalable Video Coding Extension of the H.264/AVC Standard , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Mårten Sjöström,et al.  Improved ROI video coding using variable Gaussian pre-filters and variance in intensity , 2005, IEEE International Conference on Image Processing 2005.

[43]  Chaminda T. E. R. Hewage,et al.  Flexible Macroblock Ordering for Context-Aware Ultrasound Video Transmission over Mobile WiMAX , 2010, International journal of telemedicine and applications.

[44]  Chih-Wei Tang,et al.  Spatiotemporal Visual Considerations for Video Coding , 2007, IEEE Transactions on Multimedia.

[45]  Edward J. Delp,et al.  Segmentation-Based Video Compression Using Texture and Motion Models , 2011, IEEE Journal of Selected Topics in Signal Processing.

[46]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[47]  Zhou Wang,et al.  Video quality assessment based on structural distortion measurement , 2004, Signal Process. Image Commun..

[48]  Ken Chen,et al.  Depth perceptual region-of-interest based multiview video coding , 2010, J. Vis. Commun. Image Represent..

[49]  C.-C. Jay Kuo,et al.  Synthesis-based texture coding for video compression with side information , 2008, 2008 15th IEEE International Conference on Image Processing.

[50]  E. Knudsen Fundamental components of attention. , 2007, Annual review of neuroscience.

[51]  Wilson S. Geisler,et al.  Real-time foveated multiresolution system for low-bandwidth video communication , 1998, Electronic Imaging.

[52]  Vladimir Pavlovic,et al.  Toward multimodal human-computer interface , 1998, Proc. IEEE.

[53]  Touradj Ebrahimi,et al.  Efficient video coding based on audio-visual focus of attention , 2011, J. Vis. Commun. Image Represent..

[54]  E. Macaluso,et al.  Multisensory spatial interactions: a window onto functional integration in the human brain , 2005, Trends in Neurosciences.

[55]  Francesca De Simone,et al.  Influence of audio-visual attention on perceived quality of standard definition multimedia content , 2009, 2009 International Workshop on Quality of Multimedia Experience.

[56]  J. Enns,et al.  What’s new in visual masking? , 2000, Trends in Cognitive Sciences.

[57]  Lingfen Sun,et al.  QoE Prediction Model and its Application in Video Quality Adaptation Over UMTS Networks , 2012, IEEE Transactions on Multimedia.

[58]  M. Angela Sasse,et al.  Sharp or smooth?: comparing the effects of quantization vs. frame rate for streamed video , 2004, CHI '04.

[59]  Haohong Wang,et al.  Real-Time Region-of-Interest Video Coding Using Content-Adaptive Background Skipping With Dynamic Bit Reallocation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[60]  Touradj Ebrahimi,et al.  Efficient video coding in H.264/AVC by using audio-visual information , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[61]  C. Spence Crossmodal spatial attention , 2010, Annals of the New York Academy of Sciences.

[62]  Stefan Winkler,et al.  Perceived Audiovisual Quality of Low-Bitrate Multimedia Content , 2006, IEEE Transactions on Multimedia.

[63]  Jens-Rainer Ohm,et al.  Models for Static and Dynamic Texture Synthesis in Image and Video Compression , 2011, IEEE Journal of Selected Topics in Signal Processing.

[64]  Jacques Pasquier-Rocha,et al.  Enhancing E-Health Information Systems with Agent Technology , 2008, International journal of telemedicine and applications.

[65]  Touradj Ebrahimi,et al.  Subjective Quality Evaluation of Foveated Video Coding Using Audio-Visual Focus of Attention , 2011, IEEE Journal of Selected Topics in Signal Processing.

[66]  Alan C. Bovik,et al.  Real-time foveation techniques for low bit rate video coding , 2003, Real Time Imaging.

[67]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[68]  Miska M. Hannuksela,et al.  Perceptual-based quality assessment for audio-visual services: A survey , 2010, Signal Process. Image Commun..

[69]  Chia-Hung Yeh,et al.  Region-of-interest video coding based on rate and distortion variations for H.263+ , 2008, Signal Process. Image Commun..

[70]  Chun-Jen Tsai,et al.  Visual sensitivity guided bit allocation for video coding , 2006, IEEE Transactions on Multimedia.

[71]  Max E. Stachura,et al.  Delivering Diagnostic Quality Video over Mobile Wireless Networks for Telemedicine , 2009, International journal of telemedicine and applications.

[72]  D. Tellinghuisen,et al.  The inability to ignore auditory distractors as a function of visual task perceptual load , 2003, Perception & psychophysics.

[73]  D. Heeger,et al.  Neurocinematics: The Neuroscience of Film , 2008 .

[74]  Marcus Nyström,et al.  Effect of compressed offline foveated video on viewing behavior and subjective quality , 2010, TOMCCAP.

[75]  Marios S. Pattichis,et al.  Foveated video quality assessment , 2002, IEEE Trans. Multim..

[76]  Alan C. Bovik,et al.  High quality, low delay foveated visual communications over mobile channels , 2005, J. Vis. Commun. Image Represent..

[77]  Shaul Hochstein,et al.  At first sight: A high-level pop out effect for faces , 2005, Vision Research.

[78]  Touradj Ebrahimi,et al.  Video coding based on audio-visual attention , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[79]  Zhou Wang,et al.  Foveation scalable video coding with automatic fixation selection , 2003, IEEE Trans. Image Process..

[80]  Satoshi Goto,et al.  Region-of-interest based dynamical parameter allocation for H.264/AVC encoder , 2009, 2009 Picture Coding Symposium.

[81]  Baoxin Li,et al.  An enhanced rate control scheme with motion assisted slice grouping for low bit rate coding in H.264 , 2008, 2008 15th IEEE International Conference on Image Processing.

[82]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[83]  Richard A. Foulds,et al.  Robust region of interest coding for improved sign language telecommunication , 2002, IEEE Transactions on Information Technology in Biomedicine.

[84]  Zhou Wang,et al.  Embedded foveation image coding , 2001, IEEE Trans. Image Process..

[85]  Chun-Hsien Chou,et al.  A perceptually optimized 3-D subband codec for video communication over wireless channels , 1996, IEEE Trans. Circuits Syst. Video Technol..

[86]  J. Driver,et al.  Audiovisual links in exogenous covert spatial orienting , 1997, Perception & psychophysics.

[87]  Touradj Ebrahimi,et al.  Semantic video analysis for adaptive content delivery and automatic description , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[88]  Nishan Canagarajah,et al.  Perceptually optimised sign language video coding based on eye tracking analysis , 2003 .

[89]  J Driver,et al.  A selective review of selective attention research from the past century. , 2001, British journal of psychology.

[90]  Zhou Wang,et al.  Blind measurement of blocking artifacts in images , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[91]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[92]  Laurent Itti,et al.  Visual attention guided bit allocation in video compression , 2011, Image Vis. Comput..

[93]  Zhi Liu,et al.  Motion attention based frame-level bit allocation scheme for H.264 , 2009, ICIMCS '09.

[94]  Michael Spann,et al.  A wavelet-based region of interest encoder for the compression of angiogram video sequences , 2004, IEEE Transactions on Information Technology in Biomedicine.

[95]  Qian Chen,et al.  Robust Video Region-of-Interest Coding Based on Leaky Prediction , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[96]  King Ngi Ngan,et al.  Dynamic Bit Allocation for Multiple Video Object Coding , 2006, IEEE Transactions on Multimedia.

[97]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[98]  Zhenzhong Chen,et al.  Perception-Aware Multiple Scalable Video Streaming Over WLANs , 2010, IEEE Signal Processing Letters.

[99]  Neil W. Bergmann,et al.  Perceptually based quantization technique for MPEG encoding , 1998, Electronic Imaging.

[100]  Cedric Nishan Canagarajah,et al.  Towards efficient context-specific video coding based on gaze-tracking analysis , 2007, TOMCCAP.

[101]  Thomas Wiegand,et al.  Perception-oriented video coding based on texture analysis and synthesis , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[102]  H. Isil Bozma,et al.  Attention-based video streaming , 2010, Signal Process. Image Commun..

[103]  Stefanos D. Kollias,et al.  Low bit-rate coding of image sequences using adaptive regions of interest , 1998, IEEE Trans. Circuits Syst. Video Technol..