Fast object re-detection and localization in video for spatio-temporal fragment creation

This paper presents a method for the detection and localization of instances of user-specified objects within a video or a collection of videos. The proposed method is based on the extraction and matching of SURF descriptors in video frames and further incorporates a number of improvements so as to enhance both the detection accuracy and the time efficiency of the process. Specifically, (a) GPU-based processing is introduced for specific parts of the object re-detection pipeline, (b) a new video-structure-based sampling technique is employed for limiting the number of frames that need to be processed and (c) improved robustness to scale variations is achieved by generating and employing additional instances of the object of interest based on the one originally provided by the user. The experimental results show that the algorithm achieves high levels of detection accuracy while the overall needed processing time makes the algorithm suitable for quick instance-based labeling of video and the creation of object-based spatio-temporal fragments.

[1]  Suya You,et al.  Fast Simultaneous Tracking and Recognition Using Incremental Keypoint Matching , 2008 .

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Bernard Mérialdo,et al.  Generic object tracking for fast video annotation , 2007, VISAPP.

[4]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[5]  Vincent Lepetit,et al.  Real-time learning of accurate patch rectification , 2009, CVPR.

[6]  Yiannis Kompatsiaris,et al.  Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework , 2008, 2008 15th IEEE International Conference on Image Processing.

[7]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Qing Wang,et al.  A Fast and Effective Dichotomy Based Hash Algorithm for Image Matching , 2008, ISVC.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Natasha Gelfand,et al.  SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors , 2009, CVPR.

[14]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[15]  Roberto Cipolla,et al.  Assisted Video Object Labeling By Joint Tracking of Regions and Keypoints , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Jean-Michel Morel,et al.  A fully affine invariant image comparison method , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[18]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[19]  Thomas Wiegand,et al.  SIFT Implementation and Optimization for General-Purpose GPU , 2007 .

[20]  Andrei Bursuc,et al.  ARTEMIS.Ubimedia at TRECVID 2012: Instance Search Task , 2012, TRECVID.

[21]  Luc Van Gool,et al.  Fast scale invariant feature detection and matching on programmable graphics hardware , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.