Automatic attention-based prioritization of unconstrained video for compression

We apply a biologically-motivated algorithm that selects visually-salient regions of interest in video streams to multiply-foveated video compression. Regions of high encoding priority are selected based on nonlinear integration of low-level visual cues, mimicking processing in primate occipital and posterior parietal cortex. A dynamic foveation filter then blurs (foveates) every frame, increasingly with distance from high-priority regions. Two variants of the model (one with continuously-variable blur proportional to saliency at every pixel, and the other with blur proportional to distance from three independent foveation centers) are validated against eye fixations from 4-6 human observers on 50 video clips (synthetic stimuli, video games, outdoors day and night home video, television newscast, sports, talk-shows, etc). Significant overlap is found between human and algorithmic foveations on every clip with one variant, and on 48 out of 50 clips with the other. Substantial compressed file size reductions by a factor 0.5 on average are obtained for foveated compared to unfoveated clips. These results suggest a general-purpose usefulness of the algorithm in improving compression ratios of unconstrained video.

[1]  Kannan Ramchandran,et al.  A low-complexity region-based video coder using backward morphological motion field segmentation , 1999, IEEE Trans. Image Process..

[2]  Avideh Zakhor,et al.  Multirate 3-D subband coding of video , 1994, IEEE Trans. Image Process..

[3]  S. W. Kuffler Discharge patterns and functional organization of mammalian retina. , 1953, Journal of neurophysiology.

[4]  Wilson S. Geisler,et al.  Real-time foveated multiresolution system for low-bandwidth video communication , 1998, Electronic Imaging.

[5]  Peter De Graef,et al.  On-line control of moving masks and windows on a complex background using the ATVista videographics adapter , 1994 .

[6]  Ernst Niebur,et al.  Variable-Resolution Displays: A Theoretical, Practical, and Behavioral Evaluation , 2002, Hum. Factors.

[7]  Gunilla Borgefors,et al.  Distance transformations in digital images , 1986, Comput. Vis. Graph. Image Process..

[8]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[9]  Fred Stentiford,et al.  An estimator for visual attention through competitive novelty with application to image compression , 2001 .

[10]  Neil W. Bergmann,et al.  Perceptually based quantization technique for MPEG encoding , 1998, Electronic Imaging.

[11]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[12]  Zhou Wang,et al.  Foveation scalable video coding with automatic fixation selection , 2003, IEEE Trans. Image Process..

[13]  Stefanos D. Kollias,et al.  Low bit-rate coding of image sequences using adaptive regions of interest , 1998, IEEE Trans. Circuits Syst. Video Technol..

[14]  Christof Koch,et al.  Feature combination strategies for saliency-based visual attention systems , 2001, J. Electronic Imaging.

[15]  Bruce H. McCormick,et al.  Preattentive considerations for gaze-contingent image processing , 1995, Electronic Imaging.

[16]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[17]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[18]  E. Rolls High-level vision: Object recognition and visual cognition, Shimon Ullman. MIT Press, Bradford (1996), ISBN 0 262 21013 4 , 1997 .

[19]  Michael W. Marcellin,et al.  An overview of JPEG-2000 , 2000, Proceedings DCC 2000. Data Compression Conference.

[20]  Neil J. Gordon,et al.  Efficient particle filtering for multiple target tracking with application to tracking in structured images , 2003, Image Vis. Comput..

[21]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[22]  Anthony J. Maeder,et al.  Automatic identification of perceptually important regions in an image , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[23]  Lester C. Loschky,et al.  Saliency of peripheral targets in gaze-contingent multiresolutional displays , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[24]  L. Stark,et al.  Scanpaths in Eye Movements during Pattern Perception , 1971, Science.

[25]  Joseph H. Goldberg,et al.  Eye-gaze-contingent control of the computer interface: Methodology and example for zoom detection , 1995 .

[26]  Wa James Tam,et al.  Static and dynamic spatial resolution in image coding: an investigation of eye movements , 1991, Electronic Imaging.

[27]  Penio S. Penev,et al.  Facial feature tracking and pose estimation in video sequences by factorial coding of the low-dimensional entropy manifolds due to the partial symmetries of faces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[28]  Wilson S. Geisler,et al.  Implementation of a foveated image coding system for image bandwidth reduction , 1996, Electronic Imaging.

[29]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Heidrun Schumann,et al.  Demand-driven image transmission with levels of detail and regions of interest , 1999, Comput. Graph..

[31]  D. Sparks The brainstem control of saccadic eye movements , 2002, Nature Reviews Neuroscience.

[32]  Alan C. Bovik,et al.  Low delay foveated visual communications over wireless channels , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[33]  S A Finney,et al.  Real-time data collection in Linux: A case study , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[34]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.