IRIM at TRECVID 2014: Semantic Indexing and Instance Search

The IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2011 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried different fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.1387, which ranked us 5th out of 19 partic- ipants. For the instance search task, we we used both object based query and frame based query. We formu- lated the query in standard way as comparison of visual signatures either of object with parts of DB frames or as a comparison of visual signatures of query and DB frames. To produce visual signatures we also used two apporaches: the first one is the baseline Bag-Of-Visual- Words (BOVW) model based on SURF interest point descriptor; the second approach is a Bag-Of-Regions (BOR) model that extends the traditional notion of BOVW vocabulary not only to keypoint-based descrip- tors but to region based descriptors.

[1]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[2]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[3]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Chong-Wah Ngo,et al.  VIREO/ECNU @ TRECVID 2013: A Video Dance of Detection, Recounting and Search with Motion Relativity and Concept Learning from Wild , 2013, TRECVID.

[7]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[8]  David Picard,et al.  Efficient image signatures and similarities using tensor products of local descriptors , 2013, Comput. Vis. Image Underst..

[9]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[10]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[11]  Georges Quénot,et al.  Hierarchical Late Fusion for Concept Detection in Videos , 2012, ECCV Workshops.

[12]  Bernard Mérialdo,et al.  Rushes video summarization and evaluation , 2009, Multimedia Tools and Applications.

[13]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[14]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[15]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[16]  Matthieu Cord,et al.  SALSAS: Sub-linear active learning strategy with approximate k-NN search , 2011, Pattern Recognit..

[17]  Gabriela Csurka,et al.  An empirical study of fusion operators for multimodal image retrieval , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[18]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Hervé Le Borgne,et al.  Locality-constrained and spatially regularized coding for scene categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Nicolas Ballas,et al.  IRIM at TRECVID 2013: Semantic indexing and multimedia instance search , 2013 .

[21]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Georges Quénot,et al.  LIG at TRECVid 2014: Semantic Indexing , 2014, TRECVID.

[24]  Andrej Mikulík,et al.  Large-Scale Content-Based Sub-Image Search , 2014 .

[25]  Jenny Benois-Pineau,et al.  Search of objects of interest in videos , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[26]  Mario A. Nascimento,et al.  A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[27]  Shin'ichi Satoh,et al.  Indexing local configurations of features for scalable content-based video copy detection , 2009, LS-MMRM '09.

[28]  Georges Quénot,et al.  Conceptual feedback for semantic multimedia indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[29]  Koen E. A. van de Sande,et al.  Empowering Visual Categorization With the GPU , 2011, IEEE Transactions on Multimedia.

[30]  Jenny Benois-Pineau,et al.  Content Based Image Retrieval Using Bag-Of-Regions , 2012, MMM.

[31]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Matthieu Cord,et al.  Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval , 2008, Comput. Vis. Image Underst..

[33]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[35]  Jean-Loup Guillaume,et al.  Fast unfolding of community hierarchies in large networks , 2008, ArXiv.

[36]  Shu-Yuan Chen,et al.  Image classification using color, texture and regions , 2003, Image Vis. Comput..

[37]  Jenny Benois-Pineau,et al.  Scalable object-based video retrieval in HD video databases , 2010, Signal Process. Image Commun..

[38]  Patrick Lambert,et al.  Retina enhanced SURF descriptors for spatio-temporal concept detection , 2012, Multimedia Tools and Applications.

[39]  Patrick Lambert,et al.  Retina enhanced SIFT descriptors for video indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[40]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[41]  C. Won,et al.  Efficient Use of MPEG‐7 Edge Histogram Descriptor , 2002 .

[42]  Alice Caplier,et al.  Using Human Visual System modeling for bio-inspired low level image processing , 2010, Comput. Vis. Image Underst..

[43]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[44]  Stéphane Ayache,et al.  Using Topic Concepts for Semantic Video Shots Classification , 2006, CIVR.

[45]  Koen E. A. van de Sande,et al.  A comparison of color features for visual concept classification , 2008, CIVR '08.

[46]  Georges Quénot,et al.  Quaero at TRECVID 2013: Semantic Indexing , 2013, TRECVID.

[47]  Miriam Redi,et al.  EURECOM at TrecVid 2011: The Light Semantic Indexing Task , 2011, TRECVID.

[48]  Georges Quénot,et al.  Quaero at TRECVID 2011: Semantic Indexing and Multimedia Event Detection , 2011, TRECVID.

[49]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[50]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[51]  Shin'ichi Satoh,et al.  Multi-image aggregation for better visual object retrieval , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Charles-Edmond Bichot,et al.  Color orthogonal local binary patterns combination for image region description ( Technical Report ) , 2011 .

[53]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Patrick Lambert,et al.  Bags of Trajectory Words for video indexing , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[55]  David Picard,et al.  Compact tensor based image representation for similarity search , 2012, 2012 19th IEEE International Conference on Image Processing.

[56]  Chong-Wah Ngo,et al.  Searching visual instances with topology checking and context modeling , 2013, ICMR.

[57]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[58]  Zhang Wen,et al.  PKU_ICST at TRECVID 2018: Instance Search Task , 2013, TRECVID.

[59]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[60]  Hervé Glotin,et al.  Pyramidal Multi-level Features for the Robot Vision@ICPR 2010 Challenge , 2010, 2010 20th International Conference on Pattern Recognition.

[61]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[62]  Jiri Matas,et al.  Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[63]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[64]  Stéphane Ayache,et al.  IRIM at TRECVID 2010: High Level Feature Extraction and Instance Search , 2010 .

[65]  Bernard Mérialdo,et al.  Saliency moments for image categorization , 2011, ICMR.

[66]  Shin'ichi Satoh,et al.  Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[67]  Georges Quénot,et al.  Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[68]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[69]  Edwin Lughofer,et al.  Extensions of vector quantization for incremental clustering , 2008, Pattern Recognit..

[70]  Patrick Lambert,et al.  Retina enhanced bag of words descriptors for video classification , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[71]  Hervé Le Borgne,et al.  CEA LIST at TRECVID 2012 : Semantic Indexing and instance search , 2012, TRECVID.