How Deep is Your Encoder: An Analysis of Features Descriptors for an Autoencoder-Based Audio-Visual Quality Metric

The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric's architecture and eventually develop a much better audio-visual quality metric.

[1]  Judith Redi,et al.  Color Distribution Information for the Reduced-Reference Assessment of Perceived Image Quality , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Becerra Martinez,et al.  A three layer system for audio-visual quality assessment , 2019 .

[3]  Margaret H. Pinson,et al.  Audiovisual Quality Components , 2011, IEEE Signal Processing Magazine.

[4]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[5]  Jari Korhonen Audiovisual quality assessment in communications applications: Current status, trends and challenges , 2010, 2010 International Symposium on Intelligent Signal Processing and Communication Systems.

[6]  R. Kłoda,et al.  Quantifying the amount of spatial and temporal information in video test sequences , 2007 .

[7]  Helard B. Martinez,et al.  UnB-AV: An Audio-Visual Database for Multimedia Quality Research , 2020, IEEE Access.

[8]  Mylène C. Q. Farias,et al.  Combining audio and video metrics to assess audio-visual quality , 2018, Multimedia Tools and Applications.

[9]  Mylène C. Q. Farias,et al.  NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[10]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Helard B. Martinez,et al.  A No-Reference Autoencoder Video Quality Metric , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[12]  Weisi Lin,et al.  A multi-metric fusion approach to visual quality assessment , 2011, 2011 Third International Workshop on Quality of Multimedia Experience.

[13]  Tiago H. Falk,et al.  Why is Multimedia Quality of Experience Assessment a Challenging Problem? , 2019, IEEE Access.

[14]  Alan C. Bovik,et al.  A Completely Blind Video Integrity Oracle , 2016, IEEE Transactions on Image Processing.

[15]  Martin Reisslein,et al.  Objective Video Quality Assessment Methods: A Classification, Review, and Performance Comparison , 2011, IEEE Transactions on Broadcasting.

[16]  Andrew Hines,et al.  Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio , 2017, IEEE Transactions on Broadcasting.

[17]  Alexander Raake,et al.  Impairment-Factor-Based Audiovisual Quality Model for IPTV: Influence of Video Resolution, Degradation Type, and Content Type , 2011, EURASIP J. Image Video Process..

[18]  Hemant A. Patil,et al.  Novel deep autoencoder features for non-intrusive speech quality assessment , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[19]  Tobias Meisen,et al.  Ablation Studies in Artificial Neural Networks , 2019, ArXiv.

[20]  Tiago H. Falk,et al.  Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey , 2017, IEEE Access.

[21]  Alan C. Bovik,et al.  C-DIIVINE: No-reference image quality assessment based on local magnitude and phase statistics of natural scenes , 2014, Signal Process. Image Commun..