Enhancing Reliability through Screening and Segmentation: An Online Video Subjective Quality of Experience Case Study

Abstract In this paper we examine the reliability of subjective rating judgments along a single dimension, focusing on estimates of technical quality produced by integrity impairments and failures (non-accessibility, and non-retainability) associated with viewing video. There is often considerable variability, both within and between individuals, in subjective rating tasks. In the research reported here we consider different approaches to screening out unreliable participants. We review available alternatives, including a method developed by the ITU, a method based on screening outliers, a method based on strength of correlations with an assumed “natural” ordering of impairments, and a clustering technique that makes no assumptions about the data. We report on an experiment that assesses subjective quality of experience associated with impairments and failures of online video. We then assess the reliability of the results using a correlation method and a clustering method, both of which give similar results. Since the clustering method utilized here makes fewer assumptions about the data, it may be a useful supplement to existing techniques for assessing reliability of participants when making subjective evaluations of the technical quality of videos.