TRECVID 2004 Search and Feature Extraction Task by NUS PRIS

This paper describes the details of our systems for feature extraction and search tasks of TRECVID-2004. For feature extraction, we emphasize the use of visual auto-concept annotation technique, with the fusion of text and specialized detectors, to induce concepts in videos. For the search task, our emphasis is two-fold. First we employ query-specific models, and second, we employ multi-modality features, including text, annotated visual concepts, OCR output, shot classes and specialized detectors to perform the search. Our search pipeline is similar to that employed in text-based definition question-answering approaches. We first perform query analysis to categorize the query into the categories of: {PERSON, SPORTS, FINANCE, WEATHER, DISASTER and GENERAL}. From these categories, we induce a number of constraints on the search process, including: (a) the type of multi-modality features to use or emphasize; (b) the key concept terms in text query to use; and (c) the video classes, such as sports or anchor person etc to use or exclude in the search results. The results on 60 hours of test video from TRECVID 2004 evaluation demonstrate that our approaches are effective.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[3]  Liu Huayong,et al.  The segmentation of news video into story units , 2005 .

[4]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[5]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[6]  Tobun Dorbin Ng,et al.  Video retrieval using speech and image information , 2003, IS&T/SPIE Electronic Imaging.

[7]  Chin-Hui Lee,et al.  An Adaptive Image Content Representation and Segmentation Approach to Automatic Image Annotation , 2004, CIVR.

[8]  Alexander G. Hauptmann,et al.  Searching for a specific person in broadcast news video , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Mohan S. Kankanhalli,et al.  Relevance feedback techniques for image retrieval using multiple attributes , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[10]  Tat-Seng Chua,et al.  Two-Level Multi-Modal Framework for News Story Segmentation of Large Video Corpus , 2003 .

[11]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[14]  Grace Hui Yang,et al.  Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[15]  Tat-Seng Chua,et al.  A match and tiling approach to content-based video retrieval , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[16]  Thomas S. Huang,et al.  Image processing , 1971 .

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[18]  Thomas S. Huang,et al.  Evaluating group-based relevance feedback for content-based image retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.