Detection of Multiple Instances of Video Objects

This paper tackles the issue of retrieving different instances of an object of interest within a given video document or in a video database. The principle consists of considering a semi-global image representation based on an over-segmentation of image frames. An aggregation mechanism is then applied in order to group a set of sub-regions into an object similar to the query, under a global similarity criterion. Two different types of approaches are proposed. The first one involves a greedy, dynamic region construction method. The second is based on simulated annealing, and aims at determining a global optimum of the similarity function. Experimental results show promising performances, with FT and BE detection rates of up to 66% and 86%, respectively.

[1]  Alistair I. Mees,et al.  Convergence of an annealing algorithm , 1986, Math. Program..

[2]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[3]  SchieleBernt,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008 .

[4]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[7]  Wei-Han Chang,et al.  A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval , 2008, J. Vis. Commun. Image Represent..

[8]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Tinne Tuytelaars,et al.  Dense interest points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Alan F. Smeaton,et al.  TRECVid 2010 experiments at Dublin CityUniversity , 2010 .

[12]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[13]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Andrei Bursuc,et al.  Mobile video browsing and retrieval with the OVIDIUS platform , 2010, ACM Multimedia.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ruxandra Tapu,et al.  A complete framework for temporal video segmentation , 2011, 2011 IEEE International Conference on Consumer Electronics -Berlin (ICCE-Berlin).

[19]  Jenny Benois-Pineau,et al.  Segmentation-based multi-class semantic object detection , 2012, Multimedia Tools and Applications.

[20]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Alan F. Smeaton,et al.  TRECVID 2004 Experiments in Dublin City University , 2004, TRECVID.

[22]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[23]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[24]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[25]  Jenny Benois-Pineau,et al.  Retrieval of objects in video by similarity based on graph matching , 2007, Pattern Recognit. Lett..

[26]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Takashi Toriu,et al.  Dominant Color Embedded Markov Chain Model for Object Image Retrieval , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[28]  Ze-Nian Li,et al.  Matching by Linear Programming and Successive Convexification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[30]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Hongsheng Li,et al.  Object matching with a locally affine-invariant constraint , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[33]  Alan F. Smeaton,et al.  TRECVid 2006 Experiments at Dublin City University , 2012, TRECVID.

[34]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[35]  Francoise Preteux,et al.  OVIDIUS: an on-line video indexing universal system , 2010, Optical Engineering + Applications.