We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval. The framework automatically discovers useful query classes by clustering queries in a training set according to the performance of various unimodal search methods, yielding classes of queries which have similar fusion strategies for the combination of unimodal components for multimodal search. We further combine these performance features with the semantic features of the queries during clustering in order to make discovered classes meaningful. The inclusion of the semantic space also makes it possible to choose the correct class for new, unseen queries, which have unknown performance space features. We evaluate the system against the TRECVID 2004 automatic video search task and find that the automatically discovered query classes give an improvement of 18% in MAP over hand-defined query classes used in previous works. We also find that some hand-defined query classes, such as "Named Person" and "Sports" do, indeed, have similarities in search method performance and are useful for query-class-dependent multimodal search, while other hand-defined classes, such as "Named Object" and "General Object" do not have consistent search method performance and should be split apart or replaced with other classes. The proposed framework is general and can be applied to any new domain without expert domain knowledge.
[1]
Christina S. Leslie,et al.
Fast Kernels for Inexact String Matching
,
2003,
COLT.
[2]
John Adcock,et al.
FXPAL Experiments for TRECVID 2004
,
2004,
TRECVID.
[3]
Philip Resnik,et al.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy
,
1995,
IJCAI.
[4]
Stephen E. Robertson,et al.
Okapi at TREC-3
,
1994,
TREC.
[5]
Thomas G. Dietterich,et al.
Solving Multiclass Learning Problems via Error-Correcting Output Codes
,
1994,
J. Artif. Intell. Res..
[6]
Martin F. Porter,et al.
An algorithm for suffix stripping
,
1997,
Program.
[7]
Rong Yan,et al.
Learning query-class dependent weights in automatic video retrieval
,
2004,
MULTIMEDIA '04.
[8]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[9]
George A. Miller,et al.
Introduction to WordNet: An On-line Lexical Database
,
1990
.
[10]
Jean-Luc Gauvain,et al.
The LIMSI Broadcast News transcription system
,
2002,
Speech Commun..
[11]
Tony Jebara,et al.
A Kernel Between Sets of Vectors
,
2003,
ICML.
[12]
Gang Wang,et al.
TRECVID 2004 Search and Feature Extraction Task by NUS PRIS
,
2004,
TRECVID.