COST292 experimental framework for TRECVID2008

In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two dierent systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos.

[1]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[2]  Michal Kuba,et al.  Development of a Reference Platform for Generic Audio Classification , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[3]  Bert R. Boyce,et al.  Vocabulary control for information retrieval , 1987, J. Am. Soc. Inf. Sci..

[4]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[5]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[8]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Ebroul Izquierdo,et al.  Kernels in structured multi-feature spaces for image retrieval , 2006 .

[11]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yannis Avrithis,et al.  A Semantic Multimedia Analysis Approach Utilizing a Region Thesaurus and LSA , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Jenny Benois-Pineau,et al.  Detection of visual dialog scenes in video content based on structural and semantic features , 2005 .

[15]  Ebroul Izquierdo,et al.  Combining Low-Level Features for Semantic Extraction in Image Retrieval , 2007, EURASIP J. Adv. Signal Process..

[16]  Majid Mirmehdi,et al.  ICBR - Multimedia Management System for Intelligent Content Based Retrieval , 2004, CIVR.

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[18]  Ambuj K. Singh,et al.  Efficient Index Structures for String Databases , 2001, VLDB.

[19]  Tamer Kahveci,et al.  An Efficient Index Structure for String Databases , 2001 .

[20]  Jaap A. Haitsma,et al.  Robust Audio Hashing for Content Identification , 2001 .

[21]  Jenny Benois-Pineau,et al.  Shot Boundary Detection In The Framework of Rough Indexing Paradigm , 2004, TRECVID.

[22]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[23]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  李幼升,et al.  Ph , 1989 .

[25]  Noel E. O'Connor,et al.  Combining textual and visual information processing for interactive video retrieval: SCHEMA's participation in TRECVID 2004 , 2004 .

[26]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[27]  F. W. Lancaster,et al.  Vocabulary control for information retrieval , 1972 .

[28]  N. O'Connor,et al.  Rhythm detection for speech-music discrimination in MPEG compressed domain , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[31]  Stephen L. Chiu,et al.  Extracting Fuzzy Rules from Data for Function Approximation and Pattern Classification , 2000 .

[32]  Noel E. O'Connor,et al.  Speech-music discrimination from MPEG-1 bitstream , 2001 .

[33]  Emily Gallup Fayen,et al.  Guidelines for the construction, format, and management of monolingual controlled vocabularies : A revision of ANSI/NISO Z39.19 for the 21st century , 2007 .

[34]  Alan Hanjalic,et al.  Low Level Analysis of Video Using Spatiotemporal Pixel Blocks , 2006, MRCS.

[35]  Jenny Benois-Pineau,et al.  Camera Motion Detection in the Rough Indexing Paradigm , 2005, TRECVID.

[36]  Fabrice Souvannavong,et al.  Latent semantic indexing for semantic content detection of video shots , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[37]  Neill Campbell,et al.  Comic-like Layout of Video Summaries , 2006 .