Moderate Vocabulary Visual Concept Detection for the TRECVID 2002

The explosion in multimodal content availability underlines the necessity for content management at a semantic level. We have cast the problem of detecting semantics in multimedia content as a pattern classification problem and the problem of building models of multimodal semantics as a learning problem. Recent trends show increasing use of statistical machine learning providing a computational framework for mapping low level media features to high level semantic concepts. In this chapter we expose the challenges that these techniques face. We show that if a lexicon of visual concepts is identified a priori, a statistical framework can be used to build visual feature models for the concepts in the lexicon. Using support vector machine (SVM) classification we build models for 34 semantic concepts for the TREC 2002 benchmark corpus. We study the effect of number of examples available for training with respect to their impact on detection. We also examine low level feature fusion as well as parameter sensitivity with SVM classifiers.

[1]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[2]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[4]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[5]  John R. Smith,et al.  A statistical modeling approach to content based video retrieval , 2002, Object recognition supported by user interaction for service robots.

[6]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[7]  John R. Smith,et al.  A statistical modeling approach to content based retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Marcel Worring,et al.  TREC Feature Extraction by Active Learning , 2002, TREC.

[9]  John R. Smith,et al.  Role of classifiers in multimedia content management , 2003, IS&T/SPIE Electronic Imaging.

[10]  Yanjun Qi,et al.  Video Classification and Retrieval with the Informedia Digital Video Library System , 2002, TREC.

[11]  John S. Boreczky,et al.  Finding presentations in recorded meetings using audio and video features , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Alberto Del Bimbo,et al.  Retrieval by content of commercials based on dynamics of color flows , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[13]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[14]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[15]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[16]  M. Ibrahim Sezan,et al.  A computational approach to semantic event detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[17]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[18]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[19]  Robert B. McGhee,et al.  Aircraft Identification by Moment Invariants , 1977, IEEE Transactions on Computers.

[20]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[21]  John R. Smith,et al.  A Hybrid Framework for Detecting the Semantics of Concepts and Context , 2003, CIVR.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Giridharan Iyengar,et al.  Models for automatic classification of video sequences , 1997, Electronic Imaging.

[24]  Fabrice Souvannavong,et al.  Semantic Feature Extraction using Mpeg Macro-block Classification , 2002, TREC.

[25]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[26]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[27]  Junyu Niu,et al.  FDU at TREC 2002: Filtering, Q&A, Web and Video Tasks , 2002, TREC.