Video retrieval using speech and image information

Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of different sources of information are present in most typical broadcast video collections and can be exploited for information retrieval. We will discuss the contributions of automatically recognized speech transcripts, image similarity matching, face detection and video OCR in the contexts of experiments performed as part of 2001 TREC Video Retrieval Track evaluation performed by the National Institute of Standards and Technology. For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.

[1]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[2]  Jianying Hu,et al.  Matching and retrieval based on the vocabulary and grammar of color patterns , 2000, IEEE Trans. Image Process..

[3]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Herbert Gish,et al.  GMM sample statistic log-likelihoods for text-independent speaker recognition , 1997, EUROSPEECH.

[5]  Ellen M. Voorhees,et al.  Report on the TREC-5 Confusion Track , 1996, TREC.

[6]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[7]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[10]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[11]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[12]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[13]  Richard M. Stern,et al.  Speech in Noisy Environments: robust automatic segmentation, feature extraction, and hypothesis combination , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[15]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[16]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[17]  Yan Gong,et al.  Intelligent image databases - towards advanced image retrieval , 1997, The Kluwer international series in engineering and computer science.

[18]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[20]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .