TELECOM ParisTech at ImageCLEF 2010 Photo Annotation Task: Combining Tags and Visual Features for Learning-Based Image Annotation

In this paper, we describe the participation of TELECOM ParisTech in the ImageCLEF 2010 Photo Annotation challenge. This edi- tion focuses on promoting combination between visual and tag features in order to enhance photo annotation. An image collection is supplied with tags which are used both for training and testing. Our training ap- proach consists of building SVM classiers and kernels which take into account the similarity between visual features as well as tags. The results clearly corroborate (i) the complementarity of tags and visual descriptors and (ii) the eectiveness of SVM classiers in photo annotation.

[1]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[4]  Stefanie Nowak,et al.  New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010 , 2010, CLEF.

[5]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[6]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[7]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[9]  Carman Neustaedter,et al.  Image annotation using personal calendars as context , 2008, ACM Multimedia.

[10]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[11]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[12]  Zhiwu Lu,et al.  Context-based multi-label image annotation , 2009, CIVR '09.

[13]  Shuicheng Yan,et al.  Multi-label sparse coding for automatic image annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[15]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[16]  Yi-Hsuan Yang,et al.  ContextSeer: context search and recommendation at query time for shared consumer photos , 2008, ACM Multimedia.

[17]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Edward Y. Chang,et al.  Multimodal metadata fusion using causal strength , 2005, ACM Multimedia.

[19]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[20]  Yong Wang,et al.  Translating topics to words for image annotation , 2007, CIKM '07.

[21]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[22]  Jiebo Luo,et al.  Annotating photo collections by label propagation according to multiple similarity cues , 2008, ACM Multimedia.

[23]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[24]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[25]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[26]  Jianping Fan,et al.  Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers , 2006, MM '06.

[27]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[30]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[31]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[32]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..