A Latent Model for Visual Disambiguation of Keyword-based Image Search

The problem of polysemy in keyword-based image search arises mainly from the inherent ambiguity in user queries. We propose a latent model based approach that resolves user search ambiguity by allowing sense specific diversity in search results. Given a query keyword and the images retrieved by issuing the query to an image search engine, we first learn a latent visual sense model of these polysemous images. Next, we use Wikipedia to disambiguate the word sense of the original query, and issue these Wiki-senses as new queries to retrieve sense specific images. A sense-specific image classifier is then learnt by combining information from the latent visual sense model, and used to cluster and re-rank the polysemous images from the original query keyword into its specific senses. Results on a ground truth of 17K image set returned by 10 keyword searches and their 62 word senses provides empirical indications that our method can improve upon existing keyword based search engines. Our method learns the visual word sense models in a totally unsupervised manner, effectively filters out irrelevant images, and is able to mine the long tail of image search.

[1]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[4]  Trevor Darrell,et al.  Unsupervised Learning of Visual Sense Models for Polysemous Words , 2008, NIPS.

[5]  Bernt Schiele,et al.  Decomposition, discovery and detection of visual categories using topic models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[7]  Gang Wang,et al.  OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning , 2007, CVPR.

[8]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[9]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[10]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[11]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[13]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Andrew Zisserman,et al.  Geometric LDA: A Generative Model for Particular Object Discovery , 2008, BMVC.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..