The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video

In this paper we summarize our TRECVID 2014 [12] video retrieval experiments. The MediaMill team participated in five tasks: concept detection, object localization, instance search, event recognition and recounting. We experimented with concept detection using deep learning and color difference coding [17], object localization using FLAIR [23], instance search by one example [19], event recognition based on VideoStory [4], and event recounting using COSTA [10]. Our experiments focus on establishing the video retrieval value of these innovations. The 2014 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the best result for concept detection and object localization.

[1]  Masoud Mazloom,et al.  Querying for video events by semantic signatures from few examples , 2013, MM '13.

[2]  Paul Over,et al.  Creating HAVIC: Heterogeneous Audio Visual Internet Collection , 2012, LREC.

[3]  Cees Snoek,et al.  COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Arnold W. M. Smeulders,et al.  Locality in Generic Instance Search from One Example , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Xirong Li,et al.  Evaluating sources and strategies for learning video concepts from social media , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[7]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[8]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[9]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[10]  Cees Snoek,et al.  Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[11]  Masoud Mazloom,et al.  Searching informative concept banks for video event detection , 2013, ICMR.

[12]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Arnold W. M. Smeulders,et al.  Visual-Concept Search Solved? , 2010, Computer.

[14]  Kan Chen,et al.  The 2013 SESAME Multimedia Event Detection and Recounting System , 2013, TRECVID.

[15]  Marcel Worring,et al.  Bootstrapping Visual Categorization With Relevant Negatives , 2013, IEEE Transactions on Multimedia.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[18]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Koen E. A. van de Sande,et al.  Fisher and VLAD with FLAIR , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[24]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[25]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[26]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[29]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[31]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Masoud Mazloom,et al.  Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[33]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[34]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[35]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.