Video search by multi-modal and clustering analysis

We introduce a video search system using multi-modal and clustering analysis. In this system, users can make queries by keywords or by visual contents such as frames or rectangle regions within a frame, the retrieved results are then organized into clusters and only representative frames of these clusters are shown to users. Experiments on TRECVID datasets showed promising results.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.