An efficient approach to content-based object retrieval in videos

Traditional methods for the content-based object retrieval suffers high space and time consumption. In this paper, we propose a novel method for the efficient object retrieval in videos. First, we compress the feature descriptors by tracking the feature points between consecutive frames and encoding the tracked feature points. The encoding procedures are performed by applying the motion prediction in video codec. Second, we propose a new algorithm to locate the objects by searching for the high-density positions of the related feature points in frames. To improve the speed, we count the corresponding words of feature points within the query target and calculate their spatial distributions. Experimental results show that the proposed method outperforms the previous methods.

[1]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[3]  Luc Van Gool,et al.  Spatio-temporal features for robust content-based video copy detection , 2008, MIR '08.

[4]  Michael G. Strintzis,et al.  Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Anton van den Hengel,et al.  Fast Global Kernel Density Mode Seeking: Applications to Localization and Tracking , 2007, IEEE Transactions on Image Processing.

[6]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[7]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  Dit-Yan Yeung,et al.  Localized content-based image retrieval through evidence region identification , 2009, CVPR.

[11]  Shih-Fu Chang,et al.  An integrated approach for content-based video object segmentation and retrieval , 1999, IEEE Trans. Circuits Syst. Video Technol..

[12]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[13]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[15]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[16]  Lorenzo Torresani,et al.  Learning query-dependent prefilters for scalable image retrieval , 2009, CVPR.

[17]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Winston H. Hsu,et al.  Query expansion for hash-based image object retrieval , 2009, ACM Multimedia.

[22]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[23]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[25]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[26]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[27]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[28]  Theo Gevers,et al.  Robust segmentation and tracking of colored objects in video , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Christoph H. Lampert Detecting objects in large image collections and videos by efficient subimage retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.