DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction

Recently, many models have been developed to predict video Quality of Experience (QoE), yet the applicability of these models still faces significant challenges. Firstly, many models rely on features that are unique to a specific dataset and thus lack the capability to generalize. Due to the intricate interactions among these features, a unified representation that is independent of datasets with different modalities is needed. Secondly, existing models often lack the configurability to perform both classification and regression tasks. Thirdly, the sample size of the available datasets to develop these models is often very small, and the impact of limited data on the performance of QoE models has not been adequately addressed. To address these issues, in this work we develop a novel and end-to-end framework termed as DeepQoE. The proposed framework first uses a combination of deep learning techniques, such as word embedding and 3D convolutional neural network (C3D), to extract generalized features. Next, these features are combined and fed into a neural network for representation learning. A learned representation will then serve as input for classification or regression tasks. We evaluate the performance of DeepQoE with three datasets. The results show that for small datasets (e.g., WHU-MVQoE2016 and Live-Netflix Video Database), the performance of state-of-the-art machine learning algorithms is greatly improved by using the QoE representation from DeepQoE (e.g., 35.71% to 44.82%); while for the large dataset (e.g., VideoSet), our DeepQoE framework achieves significant performance improvement in comparison to the best baseline method (90.94% vs. 82.84%). In addition to the much improved performance, DeepQoE has the flexibility to fit different datasets, to learn QoE representation, and to perform both classification and regression problems. We also develop a DeepQoE based adaptive bitrate streaming (ABR) system to verify that our framework can be easily applied to multimedia communication service. The software package of the DeepQoE framework has been released to facilitate the current research on QoE.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Alan C. Bovik,et al.  Automatic Prediction of Perceptual Image and Video Quality , 2013, Proceedings of the IEEE.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Sangwook Lee,et al.  Comparison of subjective video quality assessment methods for multimedia applications , 2007 .

[7]  Bruno Sinopoli,et al.  A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP , 2015, Comput. Commun. Rev..

[8]  Alan C. Bovik,et al.  Learning to Predict Streaming Video QoE: Distortions, Rebuffering and Memory , 2017, ArXiv.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Wan Wang,et al.  A study on QoS/QoE correlation model in wireless-network , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[13]  Srinivasan Seshan,et al.  Developing a predictive model of quality of experience for internet video , 2013, SIGCOMM.

[14]  Lingfen Sun,et al.  QoE Prediction Model and its Application in Video Quality Adaptation Over UMTS Networks , 2012, IEEE Transactions on Multimedia.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Yanjiao Chen,et al.  From QoS to QoE: A Tutorial on Video Quality Assessment , 2015, IEEE Communications Surveys & Tutorials.

[17]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Xiapu Luo,et al.  Inferring the QoE of HTTP video streaming from user-viewing activities , 2011, W-MUST '11.

[19]  Sana Ben Jemaa,et al.  Machine learning for predicting QoE of video streaming in mobile networks , 2017, 2017 IEEE International Conference on Communications (ICC).

[20]  Phuoc Tran-Gia,et al.  Quantification of YouTube QoE via Crowdsourcing , 2011, 2011 IEEE International Symposium on Multimedia.

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Tobias Hoßfeld,et al.  Active Learning for Crowdsourced QoE Modeling , 2018, IEEE Transactions on Multimedia.

[24]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Wei Song,et al.  Acceptability-Based QoE Models for Mobile Video , 2014, IEEE Transactions on Multimedia.

[26]  Weichao Li,et al.  Detecting Low-Quality Workers in QoE Crowdtesting: A Worker Behavior-Based Approach , 2017, IEEE Transactions on Multimedia.

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Yonggang Wen,et al.  QoE-Driven Cache Management for HTTP Adaptive Bit Rate Streaming Over Wireless Networks , 2012, IEEE Transactions on Multimedia.

[29]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[30]  Mohammed Ghanbari,et al.  Temporal Aspect of Perceived Quality in Mobile Video Broadcasting , 2008, IEEE Transactions on Broadcasting.

[31]  Hani Yehia,et al.  A concise review of the quality of experience assessment for video streaming , 2015, Comput. Commun..

[32]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[33]  Te-Yuan Huang,et al.  A buffer-based approach to rate adaptation: evidence from a large video streaming service , 2015, SIGCOMM 2015.

[34]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[35]  Vyas Sekar,et al.  Improving fairness, efficiency, and stability in HTTP-based adaptive video streaming with FESTIVE , 2012, CoNEXT '12.

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[38]  Mohamed-Chaker Larabi,et al.  Influence of video resolution, viewing device and audio quality on perceived multimedia quality for steaming applications , 2014, 2014 5th European Workshop on Visual Information Processing (EUVIP).

[39]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[40]  Liang Chen,et al.  A Novel Strategy to Evaluate QoE for Video Service Delivered over HTTP Adaptive Streaming , 2014, 2014 IEEE 80th Vehicular Technology Conference (VTC2014-Fall).

[41]  Vyas Sekar,et al.  Understanding the impact of video quality on user engagement , 2011, SIGCOMM.

[42]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[43]  Narciso García,et al.  QoE Analysis of Dense Multiview Video With Head-Mounted Devices , 2020, IEEE Transactions on Multimedia.

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[45]  Steven Bohez,et al.  Sensor fusion for robot control through deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Pascal Frossard,et al.  QoE-Driven Mobile Edge Caching Placement for Adaptive Video Streaming , 2018, IEEE Transactions on Multimedia.

[47]  Patrick Le Callet,et al.  Cross-lab study on preference of experience in 3DTV: Influence from display technology and test environment , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[48]  Xin Jin,et al.  VideoSet: A large-scale compressed video quality dataset based on JND measurement , 2017, J. Vis. Commun. Image Represent..

[49]  Alan C. Bovik,et al.  Recurrent and Dynamic Models for Predicting Streaming Video Quality of Experience , 2018, IEEE Transactions on Image Processing.

[50]  Yonggang Wen,et al.  Deepqoe: A Unified Framework for Learning to Predict Video QoE , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[51]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Yonggang Wen,et al.  Cloud Mobile Media: Reflections and Outlook , 2014, IEEE Transactions on Multimedia.

[53]  Alan Conrad Bovik,et al.  Study of Temporal Effects on Subjective Video Quality of Experience , 2017, IEEE Transactions on Image Processing.

[54]  Qian Liu,et al.  QoE in Video Transmission: A User Experience-Driven Strategy , 2017, IEEE Communications Surveys & Tutorials.

[55]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[56]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jianfei Cai,et al.  Optimizing Quality of Experience for Adaptive Bitrate Streaming via Viewer Interest Inference , 2018, IEEE Transactions on Multimedia.

[58]  Shuji Tasaka,et al.  Bayesian Hierarchical Regression Models for QoE Estimation and Prediction in Audiovisual Communications , 2017, IEEE Transactions on Multimedia.

[59]  Yicong Zhou,et al.  QoE Evaluation of Multimedia Services Based on Audiovisual Quality and User Interest , 2016, IEEE Transactions on Multimedia.

[60]  Ahmet M. Kondoz,et al.  Automatic QOE Prediction in Stereoscopic Videos , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.