In a subjective experiment to evaluate the perceptual audiovisual quality of multimedia and television services, raw opinion scores offered by subjects are often noisy and unreliable. Recommendations such as ITU-R BT.500, ITU-T P.910 and ITU-T P.913 standardize post-processing procedures to clean up the raw opinion scores, using techniques such as subject outlier rejection and bias removal. In this paper, we analyze the prior standardized techniques to demonstrate their weaknesses. As an alternative, we propose a simple model to account for two of the most dominant behaviors of subject inaccuracy: bias (aka systematic error) and inconsistency (aka random error). We further show that this model can also effectively deal with inattentive subjects that give random scores. We propose to use maximum likelihood estimation (MLE) to jointly estimate the model parameters, and present two numeric solvers: the first based on the Newton-Raphson method, and the second based on alternating projection. We show that the second solver can be considered as a generalization of the subject bias removal procedure in ITU-T P.913. We compare the proposed methods with the standardized techniques using real datasets and synthetic simulations, and demonstrate that the proposed methods have advantages in better model-data fit, tighter confidence intervals, better robustness against subject outliers, shorter runtime, the absence of hard coded parameters and thresholds, and auxiliary information on test subjects. The source code for this work is open-sourced at this https URL.
[1]
Margaret H. Pinson.
ITS4S2: An Image Quality Dataset With Unrepeated Images From Consumer Cameras
,
2019
.
[2]
Lucjan Janowski,et al.
Generalized Score Distribution
,
2019,
ArXiv.
[3]
Margaret H. Pinson.
ITS4S: A Video Quality Dataset with Four-Second Unrepeated Scenes
,
2018
.
[4]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[5]
Margaret H. Pinson.
AGH/NTIA: A Video Quality Subjective Test with Repeated Sequences
,
2014
.
[6]
Lucjan Janowski,et al.
Subject bias: Introducing a theoretical user model
,
2014,
2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX).
[7]
Tobias Hoßfeld,et al.
SOS: The MOS is not enough!
,
2011,
2011 Third International Workshop on Quality of Multimedia Experience.
[8]
Lucjan Janowski,et al.
The Accuracy of Subjects in a Quality Experiment: A Theoretical Subject Model
,
2015,
IEEE Transactions on Multimedia.
[9]
Marcus Barkowsky,et al.
The Influence of Subjects and Environment on Audiovisual Subjective Tests: An International Study
,
2012,
IEEE Journal of Selected Topics in Signal Processing.
[10]
Zhi Li,et al.
Recover Subjective Quality Scores from Noisy Measurements
,
2016,
2017 Data Compression Conference (DCC).