Perceptual Flicker Visibility Prediction Model

The mere presence of spatiotemporal distortions in digital videos does not have to imply quality degradation since distortion visibility can be strongly reduced by the perceptual phenomenon of visual masking. Flicker is a particularly annoying occurrence, which can arise from a variety of distortion processes. Yet flicker can also be suppressed by masking. We propose a perceptual flicker visibility prediction model which is based on a recently discovered visual change silencing phenomenon. The proposed model predicts flicker visibility on both static and moving regions without any need for content-dependent thresholds. Using a simple model of cortical responses to video flicker, an energy model of motion perception, and a divisive normalization stage, the system captures the local spectral signatures of flicker distortions and predicts perceptual flicker visibility. The model not only predicts silenced flicker distortions in the presence of motion, but also provides a pixel-wise flicker visibility index. Results show that the predicted flicker visibility model correlates well with human percepts of flicker distortions tested on the LIVE Flicker Video Database and is highly competitive with current flicker visibility prediction methods. Introduction Digital videos are increasingly pervasive due to the rapid proliferation of video streaming services, video sharing in social networks, and the global increase of mobile video traffic [1], [2]. The dramatic growth of digital videos and user demand for highquality video have necessitated the development of precise automatic perceptual video quality assessment (VQA) tools to help provide satisfactory levels of Quality of Experience (QoE) to the end user [3]. To achieve optimal video quality under limited bandwidth and power consumption, video coding technologies commonly employ lossy coding schemes, which cause compression artifacts that can lead to degradation of perceptual video quality [4]. In addition, compressed videos can suffer from transmission distortions, including packet losses and playback interruptions triggered by channel throughput fluctuations. Since humans are generally the ultimate arbiter of the received videos, predicting and reducing perceptual visual distortions of compressed digital videos is of great interest [5]. Researchers have performed a large number of subjective studies to understand essential factors that influence video quality by analyzing compression artifacts or transmission distortions of the compressed videos [6], by investigating dynamic time varying distortions [7], and by probing the time varying subjective quality of rate adaptive videos [8]. Substantial progress has also been made toward understanding and modeling low-level visual processes in the vision system extending from the retina to primary visual cortex and extra-striate cortex [9]. As a result, perceptual models of disruptions to natural scene statistics [10] and of visual masking [11] have been widely applied to predict perceptual visual quality. Spatial distortions are effectively predicted by VQA algorithms such as SSIM [12], VQM [13], MOVIE [14], STRRED [15], and Video-BLIINDS [16]. Spatial masking is well-modeled in modern perceptual image and video quality assessment tools, video compression, and watermarking. However, temporal visual masking is not well-modeled although one type of it has been observed to occur near scene changes [17], and been used in the context of early video compression methods [18-20]. Among temporal distortions, flicker distortion is particularly challenging to predict and often occurs on low bit-rate compressed videos. Flicker distortion is (spatially local or global) temporal fluctuation of luminance or chrominance in videos. Local flicker occurs mainly due to coarse quantization, varying prediction modes, mismatching of inter-frame blocks, improper deinterlacing, and dynamic rate changes caused by adaptive rate control methods [21-25]. Mosquito noise and stationary area fluctuations are also often categorized under local flicker. Mosquito noise is a joint effect of object motion and time-varying spatial artifacts such as ringing and motion prediction errors near high-contrast sharp edges or moving objects, while stationary area fluctuations result from different types of prediction, quantization levels, or a combination of these factors on static regions [4], [21]. Current flicker visibility prediction methods that operate on a compressed video measure the Sum of Squared Differences (SSD) between the block difference of an original video and the block difference of a compressed video. The block difference is obtained between successive frames on macroblocks. When the sum of squared block differences on an original video falls below a threshold, a static region is indicated [22]. The ratio between luminance level fluctuation in the compressed video and that in the original video has also been used [23]. To improve the prediction of flicker-prone blocks, a normalized fraction model was proposed [24], where the difference of SSDs between the original and compressed block differences is divided by the sum of the SSDs. These methods have the virtue of simplicity, but the resulting flicker prediction performance is limited and content-dependent. Another method included the influence of motion on flicker prediction, where motion compensation was applied prior to SSD calculation [25]. The mean absolute discrete temporal derivatives of the average DC coefficient of DCT blocks was used to measure sudden local changes (flicker) in a VQA model [16]. Current flicker prediction methods are limited to block-wise accuracy. Further, human visual system (HVS)-based perceptual flicker visibility e.g., considering temporal visual masking, has not yet been extensively studied. Recently, Suchow and Alvarez [26] demonstrated a striking “motion silencing” illusion, in the form of a powerful temporal visual masking phenomenon called change silencing, where the salient temporal changes of objects in luminance, color, size, and shape appear to cease in the presence of large object motions. This motion-induced failure to detect change not only suggests a tight coupling between motion and object appearance, but also reveals that commonly occurring temporal distortions such as flicker may be dramatically suppressed by the presence of motion. To understand the mechanism of motion silencing, physiologically plausible explanations have been proposed [26-29]. However, since the effect has only been studied on highly synthetic stimuli such as moving dots, we performed a series of human subjective studies on naturalistic videos, where flicker visibility is observed to be strongly reduced by large coherent object motions [30-33]. A consistent physiological and computational model that detects motion silencing might be useful to probe perceptual flicker visibility on compressed videos. In this paper, we propose a new perceptual flicker visibility prediction model based on motion silencing. The new perceptual flicker visibility prediction model is a significant step towards improving the performance of VQA models by making possible a model of temporal masking of temporal distortions. The new model measures the bandpass filter responses to a reference video and a corresponding flicker video using a localized multiscale 3D space time Gabor filter bank [34], [35], a spatiotemporal energy model of motion perception [36], and a divisive normalization model of nonlinear gain control in primary visual cortex [37]. We observed that flicker produces locally separated spectral signatures that almost lie along the same orientation as the motion tuned plane of the reference video but at a distance. The captured V1 responses for the flicker induced spectral signatures generally decreased when object speeds increase. Next, we measured the local difference of bandpass responses at each space-time frequency orientation and defined the sum of the magnitude responses as a perceptual flicker visibility index. The proposed model predicts temporal masking effects on flicker distortions and thereby shows highly competitive performance against previous flicker visibility prediction methods. Background: Motion Perception Motion perception is the process of inferring the speed and direction of moving objects. Since motion perception is important for understanding flicker distortions in videos, we model motion perception in the frequency domain. Watson and Ahumada [38] proposed a model of how humans sense the velocity of moving images, where the motion-sensing elements appear locally tuned to specific spatiotemporal frequencies. Assuming that complex motions of video without any scene changes can be constructed by piecing together spatiotemporally localized image patches undergoing translation, we can model the local spectral signatures of videos when an image patch moves [38]. An arbitrary space-time image patch can be represented by a function a(x, y, t) at each point x, y, and time t, and its Fourier transform by A(u, v, w) where u, v, and w are spatial and temporal frequency variables corresponding to x, y and t, respectively. Let λ and φ denote the image patch horizontal and vertical velocity components. When an image patch translates at constant velocity [λ, φ], the moving video sequence becomes b(x, y, t) = a(x – λt, y – φt, t). The spectrum of a stationary image patch lies on the u, v plane, while the Fourier transform shears into an oblique plane through the origin when the image patch moves. The orientation of this plane indicates the speed and direction of motion. Prediction of Perceptual Flicker Visibility Linear Decomposition Natural environments are inherently multi-scale and multiorientation, and objects move multi-directionally at diverse speeds. To efficiently encode visual signals, the vision system decomposes (a) (b) (c) Figure 1. Gabor filter bank in the frequency domain. (a) G

[1]  Alan C. Bovik,et al.  A Flicker Detector Model of the Motion Silencing Illusion , 2012 .

[2]  Margaret H. Pinson,et al.  A new standardized method for objectively measuring video quality , 2004, IEEE Transactions on Broadcasting.

[3]  Gustavo de Veciana,et al.  Modeling the Time—Varying Subjective Quality of HTTP Video Streams With Rate Adaptations , 2013, IEEE Transactions on Image Processing.

[4]  Touradj Ebrahimi,et al.  On Evaluating Video Object Segmentation Quality: A Perceptually Driven Objective Metric , 2009, IEEE Journal of Selected Topics in Signal Processing.

[5]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Michael Yuen,et al.  A survey of hybrid MC/DPCM/DCT video coding distortions , 1998, Signal Process..

[7]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[8]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[9]  David J. Fleet,et al.  Computation of component image velocity from local phase information , 1990, International Journal of Computer Vision.

[10]  James C. Candy,et al.  Interframe coding of videotelephone pictures , 1972 .

[11]  Zigmantas L. Budrikis,et al.  Detail perception after scene changes in television image presentations , 1965, IEEE Trans. Inf. Theory.

[12]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[13]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[14]  Kai Zeng,et al.  Characterizing perceptual artifacts in compressed video streams , 2014, Electronic Imaging.

[15]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[16]  Alan C. Bovik,et al.  Eccentricity effect of motion silencing on naturalistic videos , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[17]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[18]  Bernd Girod,et al.  The Information Theoretical Significance of Spatial and Temporal Masking in Video Signals , 1989, Photonics West - Lasers and Applications in Science and Engineering.

[19]  Gustavo de Veciana,et al.  Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies , 2012, IEEE Journal of Selected Topics in Signal Processing.

[20]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[21]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[22]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[23]  Jungyoup Yang,et al.  Flickering effect reduction for H.264/AVC intra frames , 2006, SPIE Optics East.

[24]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[25]  Alan C. Bovik,et al.  Video QoE Models for the Compute Continuum , 2013 .

[26]  A J Ahumada,et al.  Model of human visual-motion sensing. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[27]  Alan C. Bovik,et al.  On the visibility of flicker distortions in naturalistic videos , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[28]  Alan C. Bovik,et al.  Automatic Prediction of Perceptual Image and Video Quality , 2013, Proceedings of the IEEE.

[29]  Alan C. Bovik,et al.  Motion silencing of flicker distortions on naturalistic videos , 2015, Signal Process. Image Commun..

[30]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[31]  Rajiv Soundararajan,et al.  Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  David Burr,et al.  The "motion silencing" illusion results from global motion and crowding. , 2013, Journal of vision.

[33]  Truong Q. Nguyen,et al.  Adaptive Fuzzy Filtering for Artifact Reduction in Compressed Images and Videos , 2009, IEEE Transactions on Image Processing.

[34]  Atul Puri,et al.  Motion-compensated video coding with adaptive perceptual quantization , 1991, IEEE Trans. Circuits Syst. Video Technol..

[35]  Lark Kwon Choi,et al.  Spatiotemporal Flicker Detector Model of Motion Silencing , 2014, Perception.

[36]  Jordan W. Suchow,et al.  Motion Silences Awareness of Visual Change , 2011, Current Biology.