RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment

Video quality assessment (VQA), which is capable of automatically predicting the perceptual quality of source videos especially when reference information is not available, has become a major concern for video service providers due to the growing demand for video quality of experience (QoE) by end users. While significant advances have been achieved from the recent deep learning techniques, they often lead to misleading results in VQA tasks given their limitations on describing 3D spatio-temporal regularities using only fixed temporal frequency. Partially inspired by psychophysical and vision science studies revealing the speed tuning property of neurons in visual cortex when performing motion perception (i.e., sensitive to different temporal frequencies), we propose a novel no-reference (NR) VQA framework named Recurrent-In-Recurrent Network (RIRNet) to incorporate this characteristic to prompt an accurate representation of motion perception in VQA task. By fusing motion information derived from different temporal frequencies in a more efficient way, the resulting temporal modeling scheme is formulated to quantify the temporal motion effect via a hierarchical distortion description. It is found that the proposed framework is in closer agreement with quality perception of the distorted videos since it integrates concepts from motion perception in human visual system (HVS), which is manifested in the designed network structure composed of low- and high- level processing. A holistic validation of our methods on four challenging video quality databases demonstrates the superior performances over the state-of-the-art methods.

[1]  Alan C. Bovik,et al.  Spatiotemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Ke Gu,et al.  Quality Assessment of DIBR-Synthesized Images by Measuring Local Geometric Distortions and Global Sharpness , 2018, IEEE Transactions on Multimedia.

[5]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[6]  Tiago Rosa Maria Paula Queluz,et al.  No-Reference Quality Assessment of H.264/AVC Encoded Video , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Xinbo Gao,et al.  Blind Video Quality Assessment With Weakly Supervised Learning and Resampling Strategy , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Anthony J. Movshon,et al.  Visual Response Properties of Striate Cortical Neurons Projecting to Area MT in Macaque Monkeys , 1996, The Journal of Neuroscience.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Guangming Shi,et al.  Blind image quality assessment with hierarchy: Degradation from local structure to deep semantics , 2019, J. Vis. Commun. Image Represent..

[11]  Alexander Thiele,et al.  Speed skills: measuring the visual speed analyzing properties of primate MT neurons , 2001, Nature Neuroscience.

[12]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[13]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[14]  Zhengfang Duanmu,et al.  End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks , 2018, ACM Multimedia.

[15]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[16]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[18]  Judith Redi,et al.  Semantic-aware blind image quality assessment , 2018, Signal Process. Image Commun..

[19]  Guangming Shi,et al.  Quality Assessment for Video With Degradation Along Salient Trajectories , 2019, IEEE Transactions on Multimedia.

[20]  Leida Li,et al.  QoE Evaluation for Live Broadcasting Video , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[21]  Xinfeng Zhang,et al.  Blind quality index for tone-mapped images based on luminance partition , 2019, Pattern Recognit..

[22]  David S. Doermann,et al.  No-reference video quality assessment via feature learning , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[23]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[24]  Bo Hu,et al.  Internal Generative Mechanism Driven Blind Quality Index for Deblocked Images , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[25]  Taesik Lee,et al.  Product type and consumers’ perception of online consumer reviews , 2011, Electron. Mark..

[26]  Stefano Tubaro,et al.  No-Reference Pixel Video Quality Monitoring of Channel-Induced Distortion , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[28]  Lai-Man Po,et al.  No-Reference Video Quality Assessment With 3D Shearlet Transform and Convolutional Neural Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Ming Jiang,et al.  Quality Assessment of In-the-Wild Videos , 2019, ACM Multimedia.

[30]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31]  Dietmar Saupe,et al.  The Konstanz natural video database (KoNViD-1k) , 2017, 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Damon M. Chandler,et al.  ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices , 2014, J. Electronic Imaging.

[34]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yu Zhou,et al.  No-reference quality assessment of DIBR-synthesized videos by measuring temporal flickering , 2018, J. Vis. Commun. Image Represent..

[36]  Mikko Nuutinen,et al.  CVD2014—A Database for Evaluating No-Reference Video Quality Assessment Algorithms , 2016, IEEE Transactions on Image Processing.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[39]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[40]  Nicholas J. Priebe,et al.  Tuning for Spatiotemporal Frequency and Speed in Directionally Selective Neurons of Macaque Striate Cortex , 2006, The Journal of Neuroscience.

[41]  Hongliang Li,et al.  Toward a Blind Quality Metric for Temporally Distorted Streaming Video , 2018, IEEE Transactions on Broadcasting.

[42]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[43]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).