Scalable search-based image annotation

With the popularity of digital cameras, more and more people have accumulated considerable digital images on their personal devices. As a result, there are increasing needs to effectively search these personal images. Automatic image annotation may serve the goal, for the annotated keywords could facilitate the search processes. Although many image annotation methods have been proposed in recent years, their effectiveness on arbitrary personal images is constrained by their limited scalability, i.e. limited lexicon of small-scale training set. To be scalable, we propose a search-based image annotation algorithm that is analogous to information retrieval. First, content-based image retrieval technology is used to retrieve a set of visually similar images from a large-scale Web image set. Second, a text-based keyword search technique is used to obtain a ranked list of candidate annotations for each retrieved image. Third, a fusion algorithm is used to combine the ranked lists into a final candidate annotation list. Finally, the candidate annotations are re-ranked using Random Walk with Restarts and only the top ones are reserved as the final annotations. The application of both efficient search techniques and Web-scale image set guarantees the scalability of the proposed algorithm. Moreover, we provide an annotation rejection scheme to point out the images that our annotation system cannot handle well. Experimental results on U. Washington dataset show not only the effectiveness and efficiency of the proposed algorithm but also the advantage of image retrieval using annotation results over that using visual features.

[1]  Konrad Tollmar,et al.  Searching the Web with mobile images for location recognition , 2004, CVPR 2004.

[2]  Wei-Ying Ma,et al.  Image annotation by large-scale content-based image retrieval , 2006, MM '06.

[3]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[4]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[5]  Jianping Fan,et al.  Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers , 2006, MM '06.

[6]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[8]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Yuxiao Hu,et al.  Efficient propagation for face annotation in family albums , 2004, MULTIMEDIA '04.

[10]  Xing Xie,et al.  Photo-to-search: using multimodal queries to search the web from mobile devices , 2005, MIR '05.

[11]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[12]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  J. Jeon,et al.  Automatic Image Annotation of News Images with Large Vocabularies and Low Quality Training Data , 2004 .

[15]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[16]  Neill W Campbell,et al.  IEEE International Conference on Computer Vision and Pattern Recognition , 2008 .

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Changhu Wang,et al.  Scalable search-based image annotation of personal images , 2006, MIR '06.

[19]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[20]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[21]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[23]  Wei-Ying Ma,et al.  EnjoyPhoto: a vertical image search engine for enjoying high-quality photos , 2006, MM '06.

[24]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[25]  Konrad Tollmar,et al.  Searching the Web with mobile images for location recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Gustavo Carneiro,et al.  A database centric view of semantic image annotation and retrieval , 2005, SIGIR '05.

[28]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[29]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[30]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[33]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Divyakant Agrawal,et al.  Approximate nearest neighbor searching in multimedia databases , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Jing Hua,et al.  Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.