This paper presents the participation of the Fraunhofer IDMT in the ImageCLEF 2011 Photo Annotation Task. Our approach is focused on text-based features and strategies to combine visual and textual infor- mation. First, we apply a pre-processing step on the provided Flickr tags to reduce noise. For each concept, tf-idf values per tag are computed and used to construct a text-based descriptor. Second, we extract RGB-SIFT descriptors using the codebook approach. Visual and text-based features are combined, once with early fusion and once with late fusion. The con- cepts are learned with SVM classiers. Further, a post-processing step compares tags and concept names to each other. Our submission con- sists of one text-only and four multi-modal runs. The results show, that a combination of text-based and visual-features improves the result. Best results are achieved with the late fusion approach. The post-processing step only improves the results for some concepts, while others worsen. Overall, we scored a Mean Average Precision (MAP) of 37.1% and an example-based F-Measure (F-ex) of 55.2%.
[1]
Koen E. A. van de Sande,et al.
Evaluating Color Descriptors for Object and Scene Recognition
,
2010,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2]
G LoweDavid,et al.
Distinctive Image Features from Scale-Invariant Keypoints
,
2004
.
[3]
Tomohiro Takagi,et al.
Meiji University at the ImageCLEF2010 Visual Concept Detection and Annotation Task: Working notes
,
2010
.
[4]
Stefanie Nowak,et al.
The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks
,
2011,
CLEF.
[5]
Christopher D. Manning,et al.
Introduction to Information Retrieval
,
2010,
J. Assoc. Inf. Sci. Technol..
[6]
Martin F. Porter,et al.
An algorithm for suffix stripping
,
1997,
Program.