From Pairwise Comparisons and Rating to a Unified Quality Scale

The goal of psychometric scaling is the quantification of perceptual experiences, understanding the relationship between an external stimulus, the internal representation and the response. In this paper, we propose a probabilistic framework to fuse the outcome of different psychophysical experimental protocols, namely rating and pairwise comparisons experiments. Such a method can be used for merging existing datasets of subjective nature and for experiments in which both measurements are collected. We analyze and compare the outcomes of both types of experimental protocols in terms of time and accuracy in a set of simulations and experiments with benchmark and real-world image quality assessment datasets, showing the necessity of scaling and the advantages of each protocol and mixing. Although most of our examples focus on image quality assessment, our findings generalize to any other subjective quality-of-experience task.

[1]  Lea Skorin-Kapov,et al.  A Survey of Emerging Concepts and Challenges for QoE Management of Multimedia Services , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[2]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[3]  Marcus Barkowsky,et al.  Aligning subjective tests using a low cost common set , 2011 .

[4]  Dietmar Saupe,et al.  KonIQ-10k: Towards an ecologically valid and large-scale IQA database , 2018, ArXiv.

[5]  Rafał K. Mantiuk,et al.  Psychometric scaling of TID 2013 dataset , 2018 .

[6]  Thomas Pfeiffer,et al.  Adaptive Polling for Information Aggregation , 2012, AAAI.

[7]  Margaret H. Pinson,et al.  Techniques for Evaluating Objective Video Quality Models Using Overlapping Subjective Data Sets , 2008 .

[8]  R. A. Bradley,et al.  Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons , 1952 .

[9]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[10]  Emin Zerman,et al.  The Relation Between MOS and Pairwise Comparisons and the Importance of Cross-Content Comparisons , 2018, HVEI.

[11]  Martin J. Wainwright,et al.  Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence , 2015, J. Mach. Learn. Res..

[12]  Andrew B. Watson,et al.  Measurement of visual impairment scales for digital video , 2001, IS&T/SPIE Electronic Imaging.

[13]  Alan C. Bovik,et al.  Massive Online Crowdsourced Study of Subjective and Objective Picture Quality , 2015, IEEE Transactions on Image Processing.

[14]  Peter G. Engeldrum,et al.  Psychometric Scaling: A Toolkit for Imaging Systems Development , 2000 .

[15]  ITU-T Rec. P.910 (04/2008) Subjective video quality assessment methods for multimedia applications , 2009 .

[16]  Maya R. Gupta,et al.  How to Analyze Paired Comparison Data , 2011 .

[17]  Margaret H. Pinson,et al.  Comparing subjective video quality testing methodologies , 2003, Visual Communications and Image Processing.

[18]  Lucjan Janowski,et al.  The Accuracy of Subjects in a Quality Experiment: A Theoretical Subject Model , 2015, IEEE Transactions on Multimedia.

[19]  L. Thurstone,et al.  A low of comparative judgement , 1927 .

[20]  Nikolay N. Ponomarenko,et al.  Image database TID2013: Peculiarities, results and perspectives , 2015, Signal Process. Image Commun..

[21]  Rafal Mantiuk,et al.  Comparison of Four Subjective Methods for Image Quality Assessment , 2012, Comput. Graph. Forum.

[22]  T. Tominaga,et al.  Performance comparisons of subjective quality assessment methods for mobile video , 2010, 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX).

[23]  Emin Zerman,et al.  An extensive performance evaluation of full-reference HDR image quality metrics , 2017 .

[24]  María Pérez-Ortiz,et al.  Psychometric scaling of TID2013 dataset , 2018, 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX).

[25]  Rafal Mantiuk,et al.  A practical guide and software for analysing pairwise comparison experiments , 2017, ArXiv.

[26]  David S. Doermann,et al.  Active Sampling for Subjective Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Patrick Le Callet,et al.  Tradeoffs in subjective testing methods for image and video quality assessment , 2010, Electronic Imaging.

[28]  Methods , metrics and procedures for statistical evaluation , qualification and comparison of objective quality prediction models , 2013 .