Video Indexing, Search, Detection, and Description with Focus on TRECVID

There has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, public facilities, ...etc). Efficient and effective retrieval methods are critically needed in different applications. The goal of TRECVID is to encourage research in content-based video retrieval by providing large test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. In this tutorial, we present and discuss some of the most important and fundamental content-based video retrieval problems such as recognizing predefined visual concepts, searching in videos for complex ad-hoc user queries, searching by image/video examples in a video dataset to retrieve specific objects, persons, or locations, detecting events, and finally bridging the gap between vision and language by looking into how can systems automatically describe videos in a natural language. A review of the state of the art, current challenges, and future directions along with pointers to useful resources will be presented by different regular TRECVID participating teams. Each team will present one of the following tasks: Semantic INdexing (SIN) Zero-example (0Ex) Video Search (AVS) Instance Search (INS) Multimedia Event Detection (MED) Video to Text (VTT)

[1]  Xirong Li,et al.  Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction , 2016 .

[2]  Dennis Koelma,et al.  The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.

[3]  Cees Snoek,et al.  Video2vec Embeddings Recognize Events When Examples Are Scarce , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xirong Li,et al.  TagBook: A Semantic Video Representation Without Supervision for Event Detection , 2015, IEEE Transactions on Multimedia.

[5]  Shin'ichi Satoh,et al.  Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Chong-Wah Ngo,et al.  Concept-Driven Multi-Modality Fusion for Video Search , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Denis Pellerin,et al.  Learned features versus engineered features for multimedia indexing , 2017, Multimedia Tools and Applications.

[8]  Duy-Dinh Le,et al.  National Institute of Informatics, Japan at TRECVID 2008 , 2008, TRECVID.

[9]  Chong-Wah Ngo,et al.  VIREO-TNO: Multimedia Event Detection , 2015 .

[10]  Paul Over,et al.  Instance search retrospective with focus on TRECVID , 2017, International Journal of Multimedia Information Retrieval.

[11]  Shin'ichi Satoh,et al.  Large vocabulary quantization for searching instances from videos , 2012, ICMR '12.

[12]  Cees Snoek,et al.  Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[13]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[14]  Chong-Wah Ngo,et al.  Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts , 2016, ICMR.

[15]  Cees Snoek,et al.  Discovering Semantic Vocabularies for Cross-Media Retrieval , 2015, ICMR.

[16]  Wessel Kraaij,et al.  VIREO-TNO @ TRECVID 2015: Multimedia Event Detection and Video Hyperlinking , 2015, TRECVID.

[17]  Masoud Mazloom,et al.  Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[18]  Xirong Li,et al.  Early Embedding and Late Reranking for Video Captioning , 2016, ACM Multimedia.

[19]  Georges Quénot,et al.  TRECVid Semantic Indexing of Video: A 6-year Retrospective , 2016 .

[20]  Chong-Wah Ngo,et al.  Concept-Based Interactive Search System , 2017, MMM.