论文信息 - IRIM at TRECVID 2014: Semantic Indexing and Instance Search

IRIM at TRECVID 2014: Semantic Indexing and Instance Search

The IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2011 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried different fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.1387, which ranked us 5th out of 19 partic- ipants. For the instance search task, we we used both object based query and frame based query. We formu- lated the query in standard way as comparison of visual signatures either of object with parts of DB frames or as a comparison of visual signatures of query and DB frames. To produce visual signatures we also used two apporaches: the first one is the baseline Bag-Of-Visual- Words (BOVW) model based on SURF interest point descriptor; the second approach is a Bag-Of-Regions (BOR) model that extends the traditional notion of BOVW vocabulary not only to keypoint-based descrip- tors but to region based descriptors.

[1] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[2] Jiri Matas,et al. Total recall II: Query expansion revisited , 2011, CVPR 2011.

[3] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[5] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6] Chong-Wah Ngo,et al. VIREO/ECNU @ TRECVID 2013: A Video Dance of Detection, Recounting and Search with Motion Relativity and Concept Learning from Wild , 2013, TRECVID.

[7] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.

[8] David Picard,et al. Efficient image signatures and similarities using tensor products of local descriptors , 2013, Comput. Vis. Image Underst..

[9] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[10] Cordelia Schmid,et al. A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[11] Georges Quénot,et al. Hierarchical Late Fusion for Concept Detection in Videos , 2012, ECCV Workshops.

[12] Bernard Mérialdo,et al. Rushes video summarization and evaluation , 2009, Multimedia Tools and Applications.

[13] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[14] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[15] Georges Quénot,et al. Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[16] Matthieu Cord,et al. SALSAS: Sub-linear active learning strategy with approximate k-NN search , 2011, Pattern Recognit..

[17] Gabriela Csurka,et al. An empirical study of fusion operators for multimodal image retrieval , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[18] Sven J. Dickinson,et al. TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Hervé Le Borgne,et al. Locality-constrained and spatially regularized coding for scene categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Nicolas Ballas,et al. IRIM at TRECVID 2013: Semantic indexing and multimedia instance search , 2013 .

[21] Koen E. A. van de Sande,et al. Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[23] Georges Quénot,et al. LIG at TRECVid 2014: Semantic Indexing , 2014, TRECVID.

[24] Andrej Mikulík,et al. Large-Scale Content-Based Sub-Image Search , 2014 .

[25] Jenny Benois-Pineau,et al. Search of objects of interest in videos , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[26] Mario A. Nascimento,et al. A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[27] Shin'ichi Satoh,et al. Indexing local configurations of features for scalable content-based video copy detection , 2009, LS-MMRM '09.

[28] Georges Quénot,et al. Conceptual feedback for semantic multimedia indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[29] Koen E. A. van de Sande,et al. Empowering Visual Categorization With the GPU , 2011, IEEE Transactions on Multimedia.

[30] Jenny Benois-Pineau,et al. Content Based Image Retrieval Using Bag-Of-Regions , 2012, MMM.

[31] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32] Matthieu Cord,et al. Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval , 2008, Comput. Vis. Image Underst..

[33] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Andrew Zisserman,et al. Multiple queries for large scale specific object retrieval , 2012, BMVC.

[35] Jean-Loup Guillaume,et al. Fast unfolding of community hierarchies in large networks , 2008, ArXiv.

[36] Shu-Yuan Chen,et al. Image classification using color, texture and regions , 2003, Image Vis. Comput..

[37] Jenny Benois-Pineau,et al. Scalable object-based video retrieval in HD video databases , 2010, Signal Process. Image Commun..

[38] Patrick Lambert,et al. Retina enhanced SURF descriptors for spatio-temporal concept detection , 2012, Multimedia Tools and Applications.

[39] Patrick Lambert,et al. Retina enhanced SIFT descriptors for video indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[40] Georges Quénot,et al. Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[41] C. Won,et al. Efficient Use of MPEG‐7 Edge Histogram Descriptor , 2002 .

[42] Alice Caplier,et al. Using Human Visual System modeling for bio-inspired low level image processing , 2010, Comput. Vis. Image Underst..

[43] Stéphane Ayache,et al. Video Corpus Annotation Using Active Learning , 2008, ECIR.

[44] Stéphane Ayache,et al. Using Topic Concepts for Semantic Video Shots Classification , 2006, CIVR.

[45] Koen E. A. van de Sande,et al. A comparison of color features for visual concept classification , 2008, CIVR '08.

[46] Georges Quénot,et al. Quaero at TRECVID 2013: Semantic Indexing , 2013, TRECVID.

[47] Miriam Redi,et al. EURECOM at TrecVid 2011: The Light Semantic Indexing Task , 2011, TRECVID.

[48] Georges Quénot,et al. Quaero at TRECVID 2011: Semantic Indexing and Multimedia Event Detection , 2011, TRECVID.

[49] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .

[50] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[51] Shin'ichi Satoh,et al. Multi-image aggregation for better visual object retrieval , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52] Charles-Edmond Bichot,et al. Color orthogonal local binary patterns combination for image region description ( Technical Report ) , 2011 .

[53] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Patrick Lambert,et al. Bags of Trajectory Words for video indexing , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[55] David Picard,et al. Compact tensor based image representation for similarity search , 2012, 2012 19th IEEE International Conference on Image Processing.

[56] Chong-Wah Ngo,et al. Searching visual instances with topology checking and context modeling , 2013, ICMR.

[57] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[58] Zhang Wen,et al. PKU_ICST at TRECVID 2018: Instance Search Task , 2013, TRECVID.

[59] 智一吉田,et al. Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[60] Hervé Glotin,et al. Pyramidal Multi-level Features for the Robot Vision@ICPR 2010 Challenge , 2010, 2010 20th International Conference on Pattern Recognition.

[61] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[62] Jiri Matas,et al. Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[63] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[64] Stéphane Ayache,et al. IRIM at TRECVID 2010: High Level Feature Extraction and Instance Search , 2010 .

[65] Bernard Mérialdo,et al. Saliency moments for image categorization , 2011, ICMR.

[66] Shin'ichi Satoh,et al. Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[67] Georges Quénot,et al. Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[68] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[69] Edwin Lughofer,et al. Extensions of vector quantization for incremental clustering , 2008, Pattern Recognit..

[70] Patrick Lambert,et al. Retina enhanced bag of words descriptors for video classification , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[71] Hervé Le Borgne,et al. CEA LIST at TRECVID 2012 : Semantic Indexing and instance search , 2012, TRECVID.