Cluster-based data modeling for semantic video search

In this paper we present a novel approach to query-by-example using existing high-level semantics in the dataset. Typically with visual topics, the examples are not sufficiently diverse to create robust model of the user's need in the descriptor's space. As a result, direct modeling using the provided topic examples as training data is inadequate. Otherwise, systems resort to multiple content-based searches using each example in turn, which typically provides poor results. We explore the relevance of visual concept models and how they help refine the query topics. We propose a new technique of leveraging the underlying semantics contained in the visual query topic examples to improve the search. We treat the semantic space as the descriptor space, and intelligently model a query in that space. We use unlabeled data to expand the diversity of the topic examples as well as provide a robust set of negative examples that allow direct modeling. The approach intelligently models a positive and pseudo-negative space using unbiased and biased methods for data sampling and data selection, and improves semantic retrieval by %12 over TRECVID 2006 topics. Moreover, we explore the visual context in fusion with text and visual search baselines and examine how this component can improve baseline retrieval results by expanding and re-ranking them. We apply the proposed methods in a multimodal video search system, and show how the underlined semantics of the queries can significantly improve the overall visual search results, improving baseline by over 46%, and enhancing performance of other modalities by at least 10%. We also demonstrate improved robustness over a range of query topic training examples and query topics with varying visual support of in TRECVID.