Evaluating Artificial Systems for Pairwise Ranking Tasks Sensitive to Individual Differences

Owing to the advancement of deep learning, artificial systems are now rival to humans in several pattern recognition tasks, such as visual recognition of object categories. However, this is only the case with the tasks for which correct answers exist independent of human perception. There is another type of tasks for which what to predict is human perception itself, in which there are often individual differences. Then, there are no longer single "correct" answers to predict, which makes evaluation of artificial systems difficult. In this paper, focusing on pairwise ranking tasks sensitive to individual differences, we propose an evaluation method. Given a ranking result for multiple item pairs that is generated by an artificial system, our method quantifies the probability that the same ranking result will be generated by humans, and judges if it is distinguishable from human-generated results. We introduce a probabilistic model of human ranking behavior, and present an efficient computation method for the judgment. To estimate model parameters accurately from small-size samples, we present a method that uses confidence scores given by annotators for ranking each item pair. Taking as an example a task of ranking image pairs according to material attributes of objects, we demonstrate how the proposed method works.

[1]  Xiaoou Tang,et al.  Image Aesthetic Assessment: An experimental survey , 2016, IEEE Signal Processing Magazine.

[2]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[3]  Takayuki Okatani,et al.  Integrating deep features for material recognition , 2015, 2016 23rd International Conference on Pattern Recognition (ICPR).

[4]  Christiane B. Wiebel,et al.  Perceptual qualities and material classes. , 2013, Journal of vision.

[5]  Gabriela Csurka,et al.  Assessing the aesthetic quality of photographs using generic image descriptors , 2011, 2011 International Conference on Computer Vision.

[6]  Radomír Mech,et al.  Photo Aesthetics Ranking Network with Attributes and Content Adaptation , 2016, ECCV.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  E. Adelson,et al.  Accuracy and speed of material categorization in real-world images. , 2014, Journal of vision.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  R. Fleming Visual perception of materials and their properties , 2014, Vision Research.

[11]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Jianxiong Xiao,et al.  What Makes a Photograph Memorable? , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[15]  Wei Liu,et al.  RankCNN: When learning to rank encounters the pseudo preference feedback , 2014, Comput. Stand. Interfaces.

[16]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.