Assessment of Machine Learning-Based Audiovisual Quality Predictors

Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. We also identify specific limitations associated with the current validation strategy which have implications on analysis and comparison of ML-based quality predictors. These include a lack of consideration of: (a) data uncertainty, (b) domain knowledge, (c) explicit learning ability of the trained model, and (d) interpretability of the resultant model. Therefore, the primary goal of this article is to shed some light into mentioned factors. Our analysis and proposed recommendations are of particular importance in the light of significant interests in ML methods for multimedia signal processing (specifically in cases where human-labeled data is used), and a lack of discussion of mentioned issues in existing literature.

[1]  Marie-Neige Garcia Parametric Packet-Based Audiovisual Quality Model for Iptv Services , 2014 .

[2]  Lea Skorin-Kapov,et al.  A Survey of Emerging Concepts and Challenges for QoE Management of Multimedia Services , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[3]  Tetsuya Ogata,et al.  Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.

[4]  Benjamin Belmudez Audiovisual Quality Assessment and Prediction for Videotelephony , 2014 .

[5]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[6]  Abdulmotaleb El-Saddik,et al.  A Quality of Experience Model for Haptic Virtual Environments , 2014, TOMCCAP.

[7]  Jean-Charles Grégoire,et al.  Perceived Audiovisual Quality Modelling based on Decison Trees, Genetic Programming and Neural Networks , 2017, ArXiv.

[8]  Jari Korhonen,et al.  Audiovisual quality fusion based on relative multimodal complexity , 2011, 2011 18th IEEE International Conference on Image Processing.

[9]  Weisi Lin,et al.  Objective Image Quality Assessment Based on Support Vector Regression , 2010, IEEE Transactions on Neural Networks.

[10]  Shu Yang,et al.  An Audio-Visual Quality Assessment Methodology in Virtual Reality Environment , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  MunteanGabriel-Miro,et al.  User Quality of Experience of Mulsemedia Applications , 2014 .

[13]  Miska M. Hannuksela,et al.  Perceptual-based quality assessment for audio-visual services: A survey , 2010, Signal Process. Image Commun..

[14]  Mylène C. Q. Farias,et al.  Combining audio and video metrics to assess audio-visual quality , 2018, Multimedia Tools and Applications.

[15]  Jean-Charles Grégoire,et al.  Machine Learning--Based Parametric Audiovisual Quality Prediction Models for Real-Time Communications , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[16]  Alexander Raake,et al.  Impairment-Factor-Based Audiovisual Quality Model for IPTV: Influence of Video Resolution, Degradation Type, and Content Type , 2011, EURASIP J. Image Video Process..

[17]  Jean-Charles Grégoire,et al.  INRS Audiovisual Quality Dataset , 2016, ACM Multimedia.

[18]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[19]  Eirini Liotou,et al.  No-reference video quality measurement: added value of machine learning , 2015, J. Electronic Imaging.

[20]  Margaret H. Pinson,et al.  Audiovisual Quality Components , 2011, IEEE Signal Processing Magazine.

[21]  Gozde Bozdagi Akar,et al.  Video content analysis method for audiovisual quality assessment , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[22]  Manish Narwaria,et al.  Toward Better Statistical Validation of Machine Learning-Based Multimedia Quality Estimators , 2018, IEEE Transactions on Broadcasting.

[23]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[24]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[25]  Luigi Atzori,et al.  Editorial: Special Issue on “QoE Monitoring and Management for Future Internet Media Services” , 2017, Multimedia Tools and Applications.

[26]  Jean-Charles Grégoire,et al.  Machine learning based reduced reference bitstream audiovisual quality prediction models for realtime communications , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[27]  George G. Roussas,et al.  An Introduction to Probability and Statistical Inference , 2011 .

[28]  James Nightingale,et al.  5G-QoE: QoE Modelling for Ultra-HD Video Streaming in 5G Networks , 2018, IEEE Transactions on Broadcasting.

[29]  Lea Skorin-Kapov,et al.  Game Categorization for Deriving QoE-Driven Video Encoding Configuration Strategies for Cloud Gaming , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[30]  Mylène C. Q. Farias,et al.  How Deep is Your Encoder: An Analysis of Features Descriptors for an Autoencoder-Based Audio-Visual Quality Metric , 2020, 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).

[31]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[32]  Alexander Raake,et al.  At home in the lab: Assessing audiovisual quality of HTTP-based adaptive streaming with an immersive test paradigm , 2015, 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX).

[33]  Alexander Raake,et al.  Parametric audio quality model for IPTV services - ITU-T P.1201.2 audio , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[34]  Tiago H. Falk,et al.  Audio-Visual Multimedia Quality Assessment: A Comprehensive Survey , 2017, IEEE Access.

[35]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[36]  Matti Siekkinen,et al.  Can You See What I See? Quality-of-Experience Measurements of Mobile Live Video Broadcasting , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[37]  Alexander Raake,et al.  Parametric model for audiovisual quality assessment in IPTV: ITU-T Rec. P.1201.2 , 2013, 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP).

[38]  Filip De Turck,et al.  Quality of Experience-Centric Management of Adaptive Video Streaming Services , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[39]  Alexander Raake,et al.  Audiovisual quality integration: Comparison of human-human and human-machine interaction scenarios of different interactivity , 2010, 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX).

[40]  Mylène C. Q. Farias,et al.  NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[41]  José C. López-Ardao,et al.  Enhancements to the opinion model for video-telephony applications , 2009, LANC.

[42]  Shin-ichiro Iwamiya,et al.  Interactions between auditory and visual processing when listening to music in an audiovisual context : 1. Matching 2. Audio quality , 1994 .

[43]  Paolo Gastaldo,et al.  Objective quality assessment of MPEG-2 video streams by using CBP neural networks , 2002, IEEE Trans. Neural Networks.