Comparing subjective video quality testing methodologies

International recommendations for subjective video quality assessment (e.g., ITU-R BT.500-11) include specifications for how to perform many different types of subjective tests. Some of these test methods are double stimulus where viewers rate the quality or change in quality between two video streams (reference and impaired). Others are single stimulus where viewers rate the quality of just one video stream (the impaired). Two examples of the former are the double stimulus continuous quality scale (DSCQS) and double stimulus comparison scale (DSCS). An example of the latter is single stimulus continuous quality evaluation (SSCQE). Each subjective test methodology has claimed advantages. For instance, the DSCQS method is claimed to be less sensitive to context (i.e., subjective ratings are less influenced by the severity and ordering of the impairments within the test session). The SSCQE method is claimed to yield more representative quality estimates for quality monitoring applications. This paper considers data from six different subjective video quality experiments, originally performed with SSCQE, DSCQS and DSCS methodologies. A subset of video clips from each of these six experiments were combined and rated in a secondary SSCQE subjective video quality test. We give a method for post-processing the secondary SSCQE data to produce quality scores that are highly correlated to the original DSCQS and DSCS data. We also provide evidence that human memory effects for time-varying quality estimation seem to be limited to about 15 seconds.