Automatic tag expansion using visual similarity for photo sharing websites

In this paper we present an automatic photo tag expansion method designed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are used in the process of tag expansion. When a photo is to be uploaded, the system asks for a couple of initial tags from the user. The initial tags are used to retrieve relevant photos together with their tags. These photos are assumed to be potentially content related to the uploaded target photo. The tag sets of the relevant photos are used to form the candidate tag list, and visual similarities between the target photo and relevant photos are used to give weights to these candidate tags. Tags with the highest weights are suggested to the user. The method is applied on Flickr (http://www.flickr.com). Results show that including visual information in the process of photo tagging increases accuracy with respect to text-based methods.

[1]  Shih-Fu Chang,et al.  Overview of the MPEG-7 standard , 2001, IEEE Trans. Circuits Syst. Video Technol..

[2]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[3]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[4]  Pinar Duygulu Sahin,et al.  Tag Suggestr: Automatic Photo Tag Expansion Using Visual Information for Photo Sharing Websites , 2008, SAMT.

[5]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[6]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[7]  Andreas Hotho,et al.  Tag recommendations in social bookmarking systems , 2008, AI Commun..

[8]  Stefanie N. Lindstaedt,et al.  Automatic image annotation using visual content and folksonomies , 2009, Multimedia Tools and Applications.

[9]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[10]  Hui Wan,et al.  Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics , 2007, ICWSM.

[11]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[12]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[14]  Arthur Pitman,et al.  Using Visual Features to Improve Tag Suggestions in Image Sharing Sites , 2008 .

[15]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[17]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[18]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[19]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[20]  Yanfeng Sun,et al.  MiAlbum - a system for home photo managemet using the semi-automatic image annotation approach , 2000, MM 2000.

[21]  Gang Wang,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Fotis Lazarinis,et al.  Engineering and utilizing a stopword list in Greek Web retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[24]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Mary Czerwinski,et al.  Semi-Automatic Image Annotation , 2001, INTERACT.

[27]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[29]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[30]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[33]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Changhu Wang,et al.  Scalable search-based image annotation , 2008, Multimedia Systems.

[35]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[36]  Christos Faloutsos,et al.  Automatic image captioning , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[37]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.