MULTIMODAL FUSION FOR VIDEO SEARCH RERANKING

Analysis on click-through data from a very large search engine log shows that users are usually interested in the top-ranked portion of returned search results. Therefore, it is crucial for search engines to achieve high accuracy on the topranked documents. While many methods exist for boosting video search performance, they either pay less attention to the above factor or encounter difficulties in practical applications. In this paper, we present a flexible and effective re-ranking method, called CR-Re-ranking, to improve the retrieval effectiveness. To offer high accuracy on the top-ranked results, CRRe-ranking employs a cross-reference (CR) strategy to fuse multimodal cues. Specifically, multimodal features are first utilized separately to re-rank the initial returned results at the cluster level, and then all the ranked clusters from different modalities are cooperatively used to infer the shots with high relevance. Experimental results show that the search quality, especially on the top-ranked results, is improved significantly.