论文信息 - Cluster-based data modeling for semantic video search

Cluster-based data modeling for semantic video search

In this paper we present a novel approach to query-by-example using existing high-level semantics in the dataset. Typically with visual topics, the examples are not sufficiently diverse to create robust model of the user's need in the descriptor's space. As a result, direct modeling using the provided topic examples as training data is inadequate. Otherwise, systems resort to multiple content-based searches using each example in turn, which typically provides poor results. We explore the relevance of visual concept models and how they help refine the query topics. We propose a new technique of leveraging the underlying semantics contained in the visual query topic examples to improve the search. We treat the semantic space as the descriptor space, and intelligently model a query in that space. We use unlabeled data to expand the diversity of the topic examples as well as provide a robust set of negative examples that allow direct modeling. The approach intelligently models a positive and pseudo-negative space using unbiased and biased methods for data sampling and data selection, and improves semantic retrieval by %12 over TRECVID 2006 topics. Moreover, we explore the visual context in fusion with text and visual search baselines and examine how this component can improve baseline retrieval results by expanding and re-ranking them. We apply the proposed methods in a multimodal video search system, and show how the underlined semantics of the queries can significantly improve the overall visual search results, improving baseline by over 46%, and enhancing performance of other modalities by at least 10%. We also demonstrate improved robustness over a range of query topic training examples and query topics with varying visual support of in TRECVID.

John R. Smith | Apostol Natsev | Jelena Tesic

[1] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[2] Nathalie Japkowicz,et al. The Class Imbalance Problem: Significance and Strategies , 2000 .

[3] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[4] Milind R. Naphade,et al. Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5] Stephen Kwek,et al. Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[6] Milind R. Naphade,et al. Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[7] John R. Smith,et al. Semantic representation: search and mining of multimedia content , 2004, KDD '04.

[8] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[9] P. J. Green,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[10] Rong Yan,et al. Probabilistic latent query analysis for combining multiple retrieval sources , 2006, SIGIR.