The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task

This paper presents the participation of the Fraunhofer IDMT in the ImageCLEF 2011 Photo Annotation Task. Our approach is focused on text-based features and strategies to combine visual and textual infor- mation. First, we apply a pre-processing step on the provided Flickr tags to reduce noise. For each concept, tf-idf values per tag are computed and used to construct a text-based descriptor. Second, we extract RGB-SIFT descriptors using the codebook approach. Visual and text-based features are combined, once with early fusion and once with late fusion. The con- cepts are learned with SVM classiers. Further, a post-processing step compares tags and concept names to each other. Our submission con- sists of one text-only and four multi-modal runs. The results show, that a combination of text-based and visual-features improves the result. Best results are achieved with the late fusion approach. The post-processing step only improves the results for some concepts, while others worsen. Overall, we scored a Mean Average Precision (MAP) of 37.1% and an example-based F-Measure (F-ex) of 55.2%.