Automatic discovery of query-class-dependent models for multimodal search

We develop a framework for the automatic discovery of query classes for query-class-dependent search models in multimodal retrieval. The framework automatically discovers useful query classes by clustering queries in a training set according to the performance of various unimodal search methods, yielding classes of queries which have similar fusion strategies for the combination of unimodal components for multimodal search. We further combine these performance features with the semantic features of the queries during clustering in order to make discovered classes meaningful. The inclusion of the semantic space also makes it possible to choose the correct class for new, unseen queries, which have unknown performance space features. We evaluate the system against the TRECVID 2004 automatic video search task and find that the automatically discovered query classes give an improvement of 18% in MAP over hand-defined query classes used in previous works. We also find that some hand-defined query classes, such as "Named Person" and "Sports" do, indeed, have similarities in search method performance and are useful for query-class-dependent multimodal search, while other hand-defined classes, such as "Named Object" and "General Object" do not have consistent search method performance and should be split apart or replaced with other classes. The proposed framework is general and can be applied to any new domain without expert domain knowledge.