We present methods for improving text search retrieval of visual multimedia content by applying a set of visual models of semantic concepts from a lexicon of concepts deemed relevant for the collection. Text search is performed via queries of words or fully qualified sentences, and results are returned in the form of ranked video clips. Our approach involves a query expansion stage, in which query terms are compared to the visual concepts for which we independently build classifier models. We leverage a synonym dictionary and WordNet similarities during expansion. Results over each query are aggregated across the expanded terms and ranked. We validate our approach on the TRECVID 2005 broadcast news data with 39 concepts specifically designed for this genre of video. We observe that concept models improve search results by nearly 50% after model-based re-ranking of text-only search. We also observe that purely model-based retrieval significantly outperforms text-based retrieval on non-named entity queries
[1]
Milind R. Naphade,et al.
Learning the semantics of multimedia queries and concepts from a small number of examples
,
2005,
MULTIMEDIA '05.
[2]
John R. Smith,et al.
IBM Research TRECVID-2009 Video Retrieval System
,
2009,
TRECVID.
[3]
Ted Pedersen,et al.
Extended Gloss Overlaps as a Measure of Semantic Relatedness
,
2003,
IJCAI.
[4]
John R. Smith,et al.
On the detection of semantic concepts at TRECVID
,
2004,
MULTIMEDIA '04.
[5]
Shih-Fu Chang,et al.
Automatic discovery of query-class-dependent models for multimodal search
,
2005,
MULTIMEDIA '05.
[6]
Ted Pedersen,et al.
Using Measures of Semantic Relatedness for Word Sense Disambiguation
,
2003,
CICLing.