Retrieval of Multiple Instances of Objects in Videos

This paper tackles the issue of retrieving different instances of an object of interest within a given video document or in a video database. The principle consists in considering a semi-global image representation based on an over-segmentation of image frames. An aggregation mechanism is then applied in order to group a set of sub-regions into an object similar to the query, under a global similarity criterion. Two different strategies are proposed. The first one involves a greedy, dynamic region construction method. The second is based on simulated annealing, and aims at determining a global optimum. Experimental results show promising performances, with object detection rates of up to 79%.

[1]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Alan F. Smeaton,et al.  Video retrieval using dialogue, keyframe similarity and video objects , 2005, IEEE International Conference on Image Processing 2005.

[8]  Wei-Han Chang,et al.  A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval , 2008, J. Vis. Commun. Image Represent..

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Kristen Grauman,et al.  Boundary preserving dense local regions , 2011, CVPR 2011.

[11]  Ze-Nian Li,et al.  Matching by Linear Programming and Successive Convexification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Andrei Bursuc,et al.  Mobile video browsing and retrieval with the OVIDIUS platform , 2010, ACM Multimedia.

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Hongsheng Li,et al.  Object matching with a locally affine-invariant constraint , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Alistair I. Mees,et al.  Convergence of an annealing algorithm , 1986, Math. Program..

[17]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[18]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[19]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[20]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[21]  Ruxandra Tapu,et al.  A complete framework for temporal video segmentation , 2011, 2011 IEEE International Conference on Consumer Electronics -Berlin (ICCE-Berlin).

[22]  Jenny Benois-Pineau,et al.  Segmentation-based multi-class semantic object detection , 2012, Multimedia Tools and Applications.

[23]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[25]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[26]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[27]  Jenny Benois-Pineau,et al.  Retrieval of objects in video by similarity based on graph matching , 2007, Pattern Recognit. Lett..

[28]  Alexei A. Efros,et al.  Improving Spatial Support for Objects via Multiple Segmentations , 2007, BMVC.

[29]  Alan F. Smeaton,et al.  TRECVid 2006 Experiments at Dublin City University , 2012, TRECVID.

[30]  Takashi Toriu,et al.  Dominant Color Embedded Markov Chain Model for Object Image Retrieval , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[31]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  Tinne Tuytelaars,et al.  Dense interest points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.