Large vocabulary quantization for searching instances from videos

A very promising application involving video collections is to search for relevant video segments from a video database when given few visual examples of the specific instance, e.g. a person, object, or place. However, this problem is difficult due to the lighting variations, different viewpoints, partial occlusion, and large changes in appearance. In this paper, we focus on a kind of restricted instance searching task, where the region of a specific instance to be searched for is manually labeled on each query image. We formulate this problem in a large vocabulary quantization based Bag-of-Words framework, while putting more research emphasis on investigating to what extent we can benefit from these labeled instance regions. The contribution of this paper mainly lies in two aspects: first, we proposed an algorithm for instance search that outperformed all submissions on the instance search dataset TRECVID 2011. Secondly, after thoroughly analyzing the experiment results, we show that our top performance is mainly due to similar scene retrieval, instead of the same instance search. This observation reveals that in the current dataset background is more dominated than instance, and it also suggests that a promising direction in which to further improve the current algorithm, which may also be the breakthrough for achieving this challenge, is to investigate more about how to truly take advantage of additional labeled instance regions. We believe our research opens a window for future new methods for searching instance.

[1]  Olivier Buisson,et al.  Consistent visual words mining with adaptive sampling , 2011, ICMR.

[2]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[10]  Qi Tian,et al.  Large scale image search with geometric coding , 2011, ACM Multimedia.

[11]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[15]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.