Saliency map driven image retrieval combining the bag-of-words model and PLSA

A new image retrieval system is proposed that combines the bag-of-words (BoW) model and Probabilistic Latent Semantic Analysis (PLSA). First, interest points on images are detected using the Hessian-Affine keypoint detector and Scale Invariant Feature Transform (SIFT) descriptors are computed. Graph-based visual saliency maps are then employed in order to detect and discard outliers in image descriptors. By doing so, SIFT features lying in non-salient regions can be deleted. All the remaining reliable feature descriptors are divided into a number of subsets and partial vocabularies are extracted for each of them. The final vocabulary used in the BoW model is obtained by the concatenating the partial vocabularies. The resulting BoW representations are weighted using the TF-IDF scheme. Finally, the PLSA is employed to perform a probabilistic mixture decomposition of the weighted BoW representations. Query expansion is demonstrated to improve the retrieval quality. Overall a 0.79 mean average precision is reported when the saliency filtering was applied on SIFTs and the BoW plus PLSA method was used.

[1]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[2]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[3]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jiri Matas,et al.  Image Retrieval for Online Browsing in Large Image Collections , 2013, SISAP.

[5]  Cordelia Schmid,et al.  Evaluation of GIST descriptors for web-scale image search , 2009, CIVR '09.

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[11]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[12]  Nenghai Yu,et al.  Large scale image retrieval with visual groups , 2013, 2013 IEEE International Conference on Image Processing.

[13]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[14]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[15]  Li Bicheng,et al.  Bag-of-Visual-Words Based Object Retrieval with E2LSH and Query Expansion , 2012 .

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[18]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[21]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .