Interactive Spatio-Temporal Visual Map Model for Web Video Retrieval

The massive amount of multimedia information especially video available on the Web requires a more precise and interactive retrieval. Current operational video retrieval systems do not make use of the implicit visual features but rely only on textual metadata supplied by the user during uploading. This greatly affects the retrieval performance as the metadata may not be comprehensive or consistent. In this paper, we describe the use of a spatio-temporal visual map (STVM) model to supplement Web video retrieval. This is done by employing the spatio-temporal visual similarity to rerank the text-retrieval results and find new results. Experimental results on a dynamic Web video corpus show significant improvement based on STVM model, with good usability scores based on human users.