VIREO/ECNU @ TRECVID 2013: A Video Dance of Detection, Recounting and Search with Motion Relativity and Concept Learning from Wild

The VIREO group participated in four tasks: instance search, multimedia event recounting, multimedia event detection, and semantic indexing. In this paper, we will present our approaches and discuss the evaluation results. Instance Search (INS): We submitted four runs in total, experimenting three search paradigms for particular objects retrieval: (1) an elastic spatial consistency checking method; (2) a background context weighting strategy; and (3) a re-ranking step based on objects mining. The first two approaches are similar as last year [1], while the last one is our new exploration. Our submissions are all based on BoW model and tailored for the INS task. In particular, we use Delaunay Triangulation (DT) to address the complex spatial transformations for non-planar and non-rigid queries; the lack of information for small query objects is tackled with context modeling; and object mining augments the results by exploring frequent instances in TV series. - F X NO vireo dt 2: BoW method + elastic spatial checking via DT. This run corresponds to our paradigm (1), which models elastic spatial structures as deformable graphs.

[1]  Chong-Wah Ngo,et al.  Sampling and Ontologically Pooling Web Images for Visual Concept Learning , 2012, IEEE Transactions on Multimedia.

[2]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  A. Smeaton,et al.  TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST , 2011 .

[5]  Chong-Wah Ngo,et al.  Video event detection using motion relativity and visual relatedness , 2008, ACM Multimedia.

[6]  Ha Hong,et al.  The Neural Representation Benchmark and its Evaluation on Brain and Machine , 2013, ICLR.

[7]  Chong-Wah Ngo,et al.  Searching visual instances with topology checking and context modeling , 2013, ICMR.

[8]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[10]  Chong-Wah Ngo,et al.  Video concept detection by learning from web images: A case study on cross domain learning , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[11]  Chong-Wah Ngo,et al.  Predicting domain adaptivity: redo or recycle? , 2012, ACM Multimedia.

[12]  Chong-Wah Ngo,et al.  Snap-and-ask: answering multimodal question by naming visual instance , 2012, ACM Multimedia.

[13]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[14]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[15]  Anthony W. Kay,et al.  Tesseract: an open-source optical character recognition engine , 2007 .

[16]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Chong-Wah Ngo,et al.  VIREO @ TRECVID 2012: Searching with Topology, Recounting will Small Concepts, Learning with Free Examples , 2012, TRECVID.

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .