Exploring latent class information for image retrieval using the bag-of-feature model

Recently, the Bag-of-Feature (BoF) model has shown promising performance in object and generic image retrieval. The similarity between two images is typically measured by the distance between the two histograms. Due to the imperfection of local descriptor and quantization error, visually similar image patches can be wrongly quantized into different visual words, making this distance-based measure less accurate. To address this issue, this paper explores the information of latent class, which is formed by all the database images that share the same visual concept with the one being compared to a given query. We then cast image similarity as the probability of the query and a database image belonging to a same latent class. Considering that a class of images together can better depict a visual concept, the shift from image-to-image to image-to-class comparison is expected to bring a more robust similarity measure. Because the ground truth of the latent class is not accessible in image retrieval, we define a latent class prior in our probabilistic model and derive its marginal distribution. This gives rise to a novel and efficient image similarity measure. It can significantly improve retrieval performance without prolonging retrieval process. Experimental study on multiple benchmark data sets demonstrates its advantages.

[1]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[5]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  Rong Jin,et al.  Similarity Beyond Distance Measurement , 2007, RIAO.

[10]  Lei Wang,et al.  A novel framework for SVM-based image retrieval on large databases , 2005, MULTIMEDIA '05.

[11]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[13]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[14]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[15]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Wei-Ying Ma,et al.  Learning similarity measure for natural image retrieval with relevance feedback , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).