Towards the Prediction of the Quality of Experience from Facial Expression and Gaze Direction

In this paper we investigate on the potentials to implicitly estimate the Quality of Experience (QoE) of a user of video streaming services by acquiring a video of her face and monitoring her facial expression and gaze direction. To this, we conducted a crowdsourcing test in which participants were asked to watch and rate the quality when watching 20 videos subject to different impairments, while their face was recorded with their PC’s webcam. The following features were then considered: the Action Units (AU) that represent the facial expression, and the position of the eyes’ pupil. These features were then used, together with the respective QoE values provided by the participants, to train three machine learning classifiers, namely, Support Vector Machine with quadratic kernel, RUSBoost trees and bagged trees. We considered two prediction models: only the AU features are considered or together with the position of the eyes’ pupils. The RUSBoost trees achieved the best results in terms of accuracy, sensitivity and area under the curve scores. In particular, when all the features were considered, the achieved accuracy is of 44.7%, 59.4% and 75.3% when using the 5-level, 3level and 2-level quality scales, respectively. Whereas these results are not satisfactory yet, these represent a promising basis.

[1]  Alan C. Bovik,et al.  Study of the effects of stalling events on the quality of experience of mobile streaming videos , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[2]  Edilson de Aguiar,et al.  Facial expression recognition with Convolutional Neural Networks: Coping with few data and the training sample order , 2017, Pattern Recognit..

[3]  Touradj Ebrahimi,et al.  EEG correlates during video quality perception , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[4]  Touradj Ebrahimi,et al.  Predicting subjective sensation of reality during multimedia consumption based on EEG and peripheral physiological signals , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Yang Li,et al.  Real-time personalized content catering via viewer sentiment feedback: a QoE perspective , 2015, IEEE Network.

[6]  Daniel McDuff,et al.  Crowdsourcing facial responses to online videos: Extended abstract , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[7]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  André Kaup,et al.  Temporal Trajectory Aware Video Quality Measure , 2009, IEEE Journal of Selected Topics in Signal Processing.

[9]  Lamine Amour,et al.  An improved QoE estimation method based on QoS and affective computing , 2018, 2018 International Symposium on Programming and Systems (ISPS).

[10]  Andrea C. Samson,et al.  Facial expression analysis with AFFDEX and FACET: A validation study , 2018, Behavior research methods.

[11]  Phuoc Tran-Gia,et al.  Best Practices for QoE Crowdtesting: QoE Assessment With Crowdsourcing , 2014, IEEE Transactions on Multimedia.

[12]  Patrick Le Callet,et al.  Linking distortion perception and visual saliency in H.264/AVC coded video containing packet loss , 2010, Visual Communications and Image Processing.

[13]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[14]  Sebastian Bosse,et al.  Psychophysiology-Based QoE Assessment: A Survey , 2017, IEEE Journal of Selected Topics in Signal Processing.

[15]  Peter Robinson,et al.  Cross-dataset learning and person-specific normalisation for automatic Action Unit detection , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[16]  Sebastian Möller,et al.  Using eye-tracking and correlates of brain activity to predict quality scores , 2014, 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX).

[17]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  M. Peelen,et al.  Supramodal Representations of Perceived Emotions in the Human Brain , 2010, The Journal of Neuroscience.

[19]  Sebastian Möller,et al.  Using electroencephalography to analyze sleepiness due to low-quality audiovisual stimuli , 2016, Signal Process. Image Commun..