Image retrieval via learning content-based deep quality model towards big data

Abstract Image retrieval aims to search specific image from large-scale datasets. Traditional text-based and content-based image retrieval approaches have shown competitive performance. However, both of which are limited by semantic gap, i.e., they cannot reflect human perception of images. To narrow semantic gap in image retrieval, this paper proposes a deep neural network (DNN) based image retrieval method, where saliency map is derived to form human gaze shifting paths by constraint metrics. More specifically, we first design a DNN-based image saliency prediction. Subsequently, we leverage image quality assessment (IQA) algorithm to select high-quality salient regions, which will be concatenated in sequence by using proposed constraint metrics to mimic human visual perception. Afterwards, we leverage the CNN-based architecture for deep representation acquisition of each images, where spatial structure among salient regions can be well preserved. Subsequently, based on the quality score of the query image, a series of candidate images whose quality scores are similar to that of the query image are derived. Finally, we engineer a ranking distance metric to refine the candidate images to achieve image retrieval. Extend experiments demonstrate that our method outperforms several state-of-the-art algorithms.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Hong Chang,et al.  Kernel-based distance metric learning for content-based image retrieval , 2007, Image Vis. Comput..

[3]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[4]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[5]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[6]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[7]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[9]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[10]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[14]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.