Stable Hyper-pooling and Query Expansion for Event Detection

This paper makes two complementary contributions to event retrieval in large collections of videos. First, we propose hyper-pooling strategies that encode the frame descriptors into a representation of the video sequence in a stable manner. Our best choices compare favorably with regular pooling techniques based on k-means quantization. Second, we introduce a technique to improve the ranking. It can be interpreted either as a query expansion method or as a similarity adaptation based on the local context of the query video descriptor. Experiments on public benchmarks show that our methods are complementary and improve event retrieval results, without sacrificing efficiency.

[1]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[3]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[4]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Gang Hua,et al.  Scene Aligned Pooling for Complex Video Recognition , 2012, ECCV.

[7]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Cordelia Schmid,et al.  Event Retrieval in Large Video Collections with Circulant Temporal Encoding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Cordelia Schmid,et al.  AXES at TRECVID 2012: KIS, INS, and MED , 2012, TRECVID.

[11]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Olivier Buisson,et al.  Logo retrieval with a contrario visual query expansion , 2009, ACM Multimedia.

[16]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[18]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[19]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.