IBM Research and Columbia University TRECVID-2012 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), and Semantic Indexing (SIN) Systems

For this year’s TRECVID Multimedia Event Detection task, our team studied high-level visual and audio semantic features, midlevel visual attributes, and sophisticated low-level features. In addition, a range of new modeling strategies were studied, including those that take into account temporal dynamics of event semantics, optimize fusion of system components, provide linear approximations of non-linear kernels, and generate synthetic data for the limited exemplar condition. For the Pre-Specified task, we submitted 4 runs: Run 1 involved the fusion of a broad array of sophisticated low-level features. Run 2 involved the same set of low-level features to model the events under the limited exemplar condition. Run 3 involved the fusion of all our semantic system components. Run 4 was composed of the fusion of all low-level and semantic features used in Runs 1-3, in addition to event models built from techniques for linear approximation of non-linear kernels. For Ad Hoc, we submitted 2 runs: Run 5, which was the fusion of Linear Temporal Pyramids of visual semantics, fused with event models built directly on low-level features. Run 6 was our limited exemplar run, which used both Linear Temporal Pyramids of visual semantics, as well as a method for generating synthetic training data. Our experiments suggest the following: 1) Semantic modeling improves the event modeling performance of the low-level features they are based on. 2) Mid-level visual attributes contribute complimentary information. 3) Event videos demonstate temporal patterns. 4) Linear approximation methods to nonlinear kernels perform similarly to the original non-linear ker

[1]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[5]  Apostol Natsev,et al.  Web-based information content and its application to concept-based video retrieval , 2008, CIVR '08.

[6]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[8]  Rong Yan,et al.  Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce , 2009, LS-MMRM '09.

[9]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[10]  Cor J. Veenman,et al.  Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[11]  Samy Bengio,et al.  Sound Retrieval and Ranking Using Sparse Auditory Representations , 2010, Neural Computation.

[12]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Daniel P. W. Ellis,et al.  Audio-Based Semantic Concept Classification for Consumer Video , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[15]  Daniel P. W. Ellis,et al.  IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System , 2011, TRECVID.

[16]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[17]  Gang Hua,et al.  Video Event Detection Using Temporal Pyramids of Visual Semantics with Kernel Optimization and Model Subspace Boosting , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[18]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.