Totally Looks Like - How Humans Compare, Compared to Machines

Perceptual judgment of image similarity by humans relies on rich internal representations ranging from low-level features to high-level concepts, scene properties and even cultural associations. Existing methods and datasets attempting to explain perceived similarity use stimuli which arguably do not cover the full breadth of factors that affect human similarity judgments, even those geared toward this goal. We introduce a new dataset dubbed Totally-Looks-Like (TLL) after a popular entertainment website, which contains images paired by humans as being visually similar. The dataset contains 6016 image-pairs from the wild, shedding light upon a rich and diverse set of criteria employed by human beings. We conduct experiments to try to reproduce the pairings via features extracted from state-of-the-art deep convolutional neural networks, as well as additional human experiments to verify the consistency of the collected data. Even though we create conditions to artificially make the matching task increasingly easier, we show that machine-extracted representations perform very poorly in terms of reproducing the matching selected by humans. The results suggest future directions for improvement of learned image representations. Data and code will be available at https://sites.google.com/view/totally-looks-like-dataset.

[1]  Devi Parikh,et al.  Understanding image virality , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  M. Wertheimer Untersuchungen zur Lehre von der Gestalt. II , 1923 .

[3]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Xiaoou Tang,et al.  Surpassing Human-Level Face Verification Performance on LFW with GaussianFace , 2014, AAAI.

[6]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Thomas L. Griffiths,et al.  Modeling human categorization of natural images using deep feature representations , 2017, CogSci.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Ashwin K. Vijayakumar,et al.  We are Humor Beings: Understanding and Predicting Visual Humor , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[12]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[13]  Jiasen Lu,et al.  VQA: Visual Question Answering , 2015, ICCV.

[14]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Timothy F. Brady,et al.  Scene Memory Is More Detailed Than You Think : The Role of Categories in Visual Long-Term Memory , 2010 .

[17]  Aude Oliva,et al.  Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.

[18]  Scott Workman,et al.  Quantifying and Predicting Image Scenicness , 2016, ArXiv.

[19]  Katherine R. Storrs,et al.  Deep Convolutional Neural Networks Outperform Feature-Based But Not Categorical Models in Explaining Object Similarity Judgments , 2017, Front. Psychol..

[20]  S. P. Arun,et al.  Do Computational Models Differ Systematically from Human Object Perception? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[22]  Thomas L. Griffiths,et al.  Adapting Deep Network Features to Capture Psychological Representations: An Abridged Report , 2017, IJCAI.

[23]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[26]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  SchmidhuberJürgen Deep learning in neural networks , 2015 .