No-reference video quality assessment metric using spatiotemporal features through LSTM

Nowadays, a precise video quality assessment (VQA) model is essential to maintain the quality of service (QoS). However, most existing VQA metrics are designed for specific purposes and ignore the spatiotemporal features of nature video. This paper proposes a novel general-purpose no-reference (NR) VQA metric adopting Long Short-Term Memory (LSTM) modules with the masking layer and pre-padding strategy, namely VQA-LSTM, to solve the above issues. First, we divide the distorted video into frames and extract some significant but also universal spatial and temporal features that could effectively reflect the quality of frames. Second, the data preprocessing stage and pre-padding strategy are used to process data to ease the training for our VQA-LSTM. Finally, a three-layer LSTM model incorporated with masking layer is designed to learn the sequence of spatial features as spatiotemporal features and learn the sequence of temporal features as the gradient of temporal features to evaluate the quality of videos. Two widely used VQA database, MCL-V and LIVE, are tested to prove the robustness of our VQA-LSTM, and the experimental results show that our VQA-LSTM has a better correlation with human perception than some state-of-the-art approaches.

[1]  Hoong-Cheng Soong,et al.  Video quality assessment: A review of full-referenced, reduced-referenced and no-referenced methods , 2017, 2017 IEEE 13th International Colloquium on Signal Processing & its Applications (CSPA).

[2]  Xinbo Gao,et al.  Blind Video Quality Assessment With Weakly Supervised Learning and Resampling Strategy , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Alan C. Bovik,et al.  Image information and visual quality , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[5]  C.-C. Jay Kuo,et al.  MCL-V: A streaming video quality assessment database , 2015, J. Vis. Commun. Image Represent..

[6]  Sanghoon Lee,et al.  Deep Blind Video Quality Assessment Based on Temporal Human Perception , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[7]  Soo-Chang Pei,et al.  Spatial-Temporal Visual Attention Model for Video Quality Assessment , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[8]  N. V. Subba Reddy,et al.  Effects of padding on LSTMs and CNNs , 2019, ArXiv.

[9]  Christophe Charrier,et al.  Blind Prediction of Natural Video Quality , 2014, IEEE Transactions on Image Processing.

[10]  King Ngi Ngan,et al.  Full-Reference Video Quality Assessment by Decoupling Detail Losses and Additive Impairments , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[12]  Damon M. Chandler,et al.  A spatiotemporal most-apparent-distortion model for video quality assessment , 2011, 2011 18th IEEE International Conference on Image Processing.

[13]  Rajiv Soundararajan,et al.  Video Quality Assessment by Reduced Reference Spatio-Temporal Entropic Differencing , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Vijayan K. Asari,et al.  A no-reference video quality assessment based on Laplacian pyramids , 2013, 2013 IEEE International Conference on Image Processing.

[15]  Avideh Zakhor,et al.  An optimization approach for removing blocking effects in transform coding , 1995, IEEE Trans. Circuits Syst. Video Technol..

[16]  Xinbo Gao,et al.  Objective Video Quality Assessment Combining Transfer Learning With CNN , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[17]  John Immerkær,et al.  Fast Noise Variance Estimation , 1996, Comput. Vis. Image Underst..

[18]  Vijayan K. Asari,et al.  No-Reference Video Quality Assessment Based on Artifact Measurement and Statistical Analysis , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Wang Jing,et al.  A no-reference video quality assessment method for VoIP applications , 2016, 2016 IEEE 13th International Conference on Signal Processing (ICSP).