Informedia @ TRECVID 2016

We report on our system used in the TRECVID 2016 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 000Ex, 010Ex and 100Ex settings for the Pre-specified Events. On the AVS task, the CMU team submitted runs for fully-automatic system with no annotation condition.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Lori Lamel Multilingual Speech Processing Activities in Quaero: Application to Multimedia Search in Unstructured Data , 2012, Baltic HLT.

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Tao Mei,et al.  MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[7]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Trevor Darrell,et al.  YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Francis Ferraro,et al.  Visual Storytelling , 2016, NAACL.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Alexander G. Hauptmann,et al.  Which Information Sources are More Effective and Reliable in Video Search , 2016, SIGIR.

[15]  Gareth J. F. Jones,et al.  Evaluating Search and Hyperlinking: An Example of the Design, Test, Refine Cycle for Metric Development , 2015, MediaEval.

[16]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[17]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[19]  Dimosthenis Karatzas,et al.  TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[20]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[24]  Yi Yang,et al.  Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Maria Eskevich,et al.  Defining and Evaluating Video Hyperlinking for Navigating Multimedia Archives , 2015, WWW.

[26]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[27]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Craig MacDonald,et al.  From Puppy to Maturity: Experiences in Developing Terrier , 2012, OSIR@SIGIR.

[29]  Paul Deléglise,et al.  Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.