The value of stories for speech-based video search

Anecdotal evidence suggests that story-level information is important for the speech component of video retrieval. In this paper we perform a systematic examination of the combination of shot-level and story-level speech, using a document expansion approach. We isolate speech from other retrieval features, and evaluate on the 2003--2006 TRECVID test sets with a set of 94 natural language queries. Our main finding is that that the use of story information significantly improves retrieval performance compared to shotbased search, increasing overall mean average precision by over 65%

[1]  John R. Smith,et al.  Integrating Features, Models, and Semantics for TREC Video Retrieval , 2001, TREC.

[2]  Winston H. Hsu,et al.  Video Search and High-Level Feature Extraction , 2005 .

[3]  Shih-Fu Chang,et al.  Story boundary detection in large broadcast news video archives: techniques, experience and trends , 2004, MULTIMEDIA '04.

[4]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[5]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[6]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[7]  Jun Yang,et al.  Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[8]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[9]  Maarten de Rijke,et al.  Term Selection and Query Operations for Video Retrieval , 2007, ECIR.

[10]  Shih-Fu Chang,et al.  Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.

[11]  Sheng Tang,et al.  TRECVID 2006 by NUS-I2R , 2006, TRECVID.

[12]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[13]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[14]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[15]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[16]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[17]  Yanjun Qi,et al.  Video Classification and Retrieval with the Informedia Digital Video Library System , 2002, TREC.

[18]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[19]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[20]  Jun Yang,et al.  Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.