Multimedia Event Detection and Recounting

In Multimedia Event Detection 2013 evaluation, SRI Aurora team participated in EK100, EK10, and EK0 tasks with full system evaluation. We submitted 15 runs for both pre-specified events (PS-Events) and ad-hoc events (AH-Events). The majority of them achieved satisfactory results. In particular, thanks to the well-designed concept features, our EK10 system works consistently much better for both PSEvents and AH-Events. By creating the concept language model from the web source, we build our EK0 system to perform event detection without training examples. This system achieved promising results on PS-Events. In MER task, we developed an approach to provide a breakdown of the evidences of why the MED decision has been made by exploring the SVM-based event detector. Furthermore, we designed evidence specific verification and detection to reduce uncertainty and improve key evidence discovery.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Bhiksha Raj,et al.  Unsupervised hierarchical structure induction for deeper semantic analysis of audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Hui Cheng,et al.  Multimedia event recounting with concept based representation , 2012, ACM Multimedia.

[4]  Bhiksha Raj,et al.  Unsupervised Structure Discovery for Semantic Analysis of Audio , 2012, NIPS.

[5]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[6]  Lukás Burget,et al.  Parallel training of neural networks for speech recognition , 2010, INTERSPEECH.

[7]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[8]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[9]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[10]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Hui Cheng,et al.  Video event recognition using concept attributes , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[12]  Frank K. Soong,et al.  A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[18]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.