论文信息 - The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video

Abstract:In this paper we summarize our TRECVID 2014 [12] video retrieval experiments. The MediaMill team participated in five tasks: concept detection, object localization, instance search, event recognition and recounting. We experimented with concept detection using deep learning and color difference coding [17], object localization using FLAIR [23], instance search by one example [19], event recognition based on VideoStory [4], and event recounting using COSTA [10]. Our experiments focus on establishing the video retrieval value of these innovations. The 2014 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the best result for concept detection and object localization.

参考文献

[1] Masoud Mazloom,et al. Querying for video events by semantic signatures from few examples , 2013, MM '13.

[2] Paul Over,et al. Creating HAVIC: Heterogeneous Audio Visual Internet Collection , 2012, LREC.

[3] Cees Snoek,et al. COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Arnold W. M. Smeulders,et al. Locality in Generic Instance Search from One Example , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6] Xirong Li,et al. Evaluating sources and strategies for learning video concepts from social media , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[7] Jiri Matas,et al. Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[8] Koen E. A. van de Sande,et al. Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[9] Paul Over,et al. TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[10] Cees Snoek,et al. Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[11] Masoud Mazloom,et al. Searching informative concept banks for video event detection , 2013, ICMR.

[12] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Arnold W. M. Smeulders,et al. Visual-Concept Search Solved? , 2010, Computer.

[14] Kan Chen,et al. The 2013 SESAME Multimedia Event Detection and Recounting System , 2013, TRECVID.

[15] Marcel Worring,et al. Bootstrapping Visual Categorization With Relevant Negatives , 2013, IEEE Transactions on Multimedia.

[16] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[17] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[18] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[19] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20] Subhransu Maji,et al. Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Koen E. A. van de Sande,et al. Fisher and VLAD with FLAIR , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[24] Dennis Koelma,et al. The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[25] Cees Snoek,et al. VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[26] Florent Perronnin,et al. Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[27] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28] Marcel Worring,et al. Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[29] Yannis Avrithis,et al. To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[30] Ramakant Nevatia,et al. Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[31] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Masoud Mazloom,et al. Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[33] Stéphane Ayache,et al. Video Corpus Annotation Using Active Learning , 2008, ECIR.

[34] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[35] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

引用

Deep Learning Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data

2018

Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters

Int. J. Multim. Data Eng. Manag.

2017

Web-scale Multimedia Search for Internet Video Content

WSDM

2016

Super Fast Event Recognition in Internet Videos

IEEE Transactions on Multimedia

2015

Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification

ACM Multimedia

2014

Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence

2016

Predicting Behavioural Patterns in Discussion Forums using Deep Learning on Hypergraphs

2019 International Conference on Content-Based Multimedia Indexing (CBMI)

2019

Integrating deep learning with correlation-based multimedia semantic concept detection

2015

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection

WISE

2015

The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video

Deep Learning Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data

Multimedia Pivot Tables for Multimedia Analytics on Image Collections

Insight in Image Collections by Multimedia Pivot Tables

Few-Shot Adaptation for Multimedia Semantic Indexing

Semantic Indexing for Large-Scale Video Retrieval

Visual Learning of Socio-Video Semantics

Topological Spatial Verification for Instance Search

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Enhanced image and video representation for visual recognition

Video Content Understanding Using Text

Objects2action: Classifying and Localizing Actions without Any Video Example

Objects2action: Classifying and Localizing Actions without Any Video Example

Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters

Web-scale Multimedia Search for Internet Video Content

Super Fast Event Recognition in Internet Videos

Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification

Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search

Predicting Behavioural Patterns in Discussion Forums using Deep Learning on Hypergraphs

Integrating deep learning with correlation-based multimedia semantic concept detection

Correlation-Based Deep Learning for Multimedia Semantic Concept Detection