Multi-modal query expansion for web video search

Query expansion is an effective method to improve the usability of multimedia search. Most existing multimedia search engines are able to automatically expand a list of textual query terms based on text search techniques, which can be called textual query expansion (TQE). However, the annotations (title and tag) around web videos are generally noisier for text-only query expansion and search matching. In this paper, we propose a novel multi-modal query expansion (MMQE) framework for web video search to solve the issue. Compared with traditional methods, MMQE provides a more intuitive query suggestion by transforming tex-tual query to visual presentation based on visual clustering. Paral-lel to this, MMQE can enhance the process of search matching with strong pertinence of intent-specific query by joining textual, visual and social cues from both metadata and content of videos. Experimental results on real web videos from YouTube demon-strate the effectiveness of the proposed method.