Speech technology plays a key role in video semantic indexing

Video semantic indexing is a core task in content-based video retrieval (CBVR), in which a user submits a text query for an object or a scene to a search system and the system returns video shots that include the object or scene. We introduce an emerging framework for this task, which heavily relies on statistical speaker verification and adaptation techniques. It employs Gaussian-mixture-model (GMM) supervectors and support vector machines (SVM) to detect a large variety of objects and scenes robustly from video. It has shown excellent performance in the Semantic indexing task of the TRECVID 2011 workshop, where a large archive of consumer-produced Internet videos are used for evaluation.