Gaze-guided Image Classification for Reflecting Perceptual Class Ambiguity

Despite advances in machine learning and deep neural networks, there is still a huge gap between machine and human image understanding. One of the causes is the annotation process used to label training images. In most image categorization tasks, there is a fundamental ambiguity between some image categories and the underlying class probability differs from very obvious cases to ambiguous ones. However, current machine learning systems and applications usually work with discrete annotation processes and the training labels do not reflect this ambiguity. To address this issue, we propose an new image annotation framework where labeling incorporates human gaze behavior. In this framework, gaze behavior is used to predict image labeling difficulty. The image classifier is then trained with sample weights defined by the predicted difficulty. We demonstrate our approach's effectiveness on four-class image classification tasks.

[1]  Hans-Werner Gellersen,et al.  EyeContext: recognition of high-level contextual cues from human visual behaviour , 2013, CHI.

[2]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[3]  Walter J. Scheirer,et al.  Using human brain activity to guide machine learning , 2017, Scientific Reports.

[4]  S. P. Arun,et al.  Do Computational Models Differ Systematically from Human Object Perception? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[6]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ali Borji,et al.  Human vs. Computer in Scene and Object Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  S. Shimojo,et al.  Gaze bias both reflects and influences preference , 2003, Nature Neuroscience.

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Yoichi Sato,et al.  Image preference estimation with a data-driven approach: A comparative study between gaze and image features , 2014 .

[15]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Walter J. Scheirer,et al.  Perceptual Annotation: Measuring Human Vision to Improve Computer Vision , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Mario Fritz,et al.  Prediction of search targets from fixations in open-world settings , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).