Motivated by the adage that a “picture is worth a thousand words” it can be reasoned that automatically enriching the textual content of
a document with relevant images can increase the readability of a document. Moreover, features extracted from the additional image
data inserted into the textual content of a document may, in principle, be also be used by a retrieval engine to better match the topic of a
document with that of a given query. In this paper, we describe our approach of building a ground truth dataset to enable further research
into automatic addition of relevant images to text documents. The dataset is comprised of the official ImageCLEF 2010 collection (a
collection of images with textual metadata) to serve as the images available for automatic enrichment of text, a set of 25 benchmark
documents that are to be enriched, which in this case are children’s short stories, and a set of manually judged relevant images for each
query story obtained by the standard procedure of depth pooling. We use this benchmark dataset to evaluate the effectiveness of standard
information retrieval methods as simple baselines for this task. The results indicate that using the whole story as a weighted query,
where the weight of each query term is its tf-idf value, achieves an precision of 0.1714 within the top 5 retrieved images on an average.
[1]
Donna K. Harman,et al.
Overview of the Eighth Text REtrieval Conference (TREC-8)
,
1999,
TREC.
[2]
Yansong Feng,et al.
Topic Models for Image Annotation and Text Illustration
,
2010,
HLT-NAACL.
[3]
W. Bruce Croft,et al.
A Language Modeling Approach to Information Retrieval
,
1998,
SIGIR Forum.
[4]
Edward A. Fox,et al.
Combination of Multiple Searches
,
1993,
TREC.
[5]
Samy Bengio,et al.
Show and tell: A neural image caption generator
,
2014,
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
Eneko Agirre,et al.
Enabling the Discovery of Digital Cultural Heritage Objects through Wikipedia
,
2012,
LaTeCH@EACL.
[7]
Yoshua Bengio,et al.
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
,
2015,
ICML.
[8]
Fei-Fei Li,et al.
Deep visual-semantic alignments for generating image descriptions
,
2014,
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9]
Adrian Popescu,et al.
Overview of the Wikipedia Retrieval Task at ImageCLEF 2010
,
2010,
CLEF.
[10]
Miguel Cazorla,et al.
ImageCLEF 2014: Overview and Analysis of the Results
,
2014,
CLEF.