Retrieving Objects, People and Places from a video Collection: TRECVID'12 Instance Search Task

We participated in 2012’s TRECVID instance search task (INS) and wanted to measure how much we can positively impact the performance of a state-of-the-art video retrieval system based on local features and relying solely on content-based retrieval methods. Our agenda consisted in the implementation of some incremental additions to the system that included filtering of local features, custom codebook generation and tailored ranking metrics for the indexed videos. We got three versions of our system tested which iteratively included algorithms that consistently pushed further the performance of the system. Given the terms under which the system had to be implemented – ground-truth was not available–, we used artificially generated datasets to get an idea of how much progress were we making with each additional component. We showed that improvements requiring small computational and human effort can already have positive impacts on the system’s performance.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Hermann Ney,et al.  Features for Image Retrieval: A Quantitative Comparison , 2004, DAGM-Symposium.

[3]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[4]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[5]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[6]  Markus Koch,et al.  TubeFiler: an automatic web video categorizer , 2009, ACM Multimedia.

[7]  Paul England,et al.  Comparison of automatic video segmentation algorithms , 1996, Other Conferences.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[10]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Rainer Stiefelhagen,et al.  Content-based video genre classification using multiple cues , 2010, AIEMPro '10.

[12]  Shih-Fu Chang,et al.  VideoQ: a fully automated video retrieval system using motion sketches , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[13]  Avi Arampatzis,et al.  Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval , 2011, SIGIR.

[14]  Matthew B. Blaschko,et al.  Combining Local and Global Image Features for Object Class Recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[15]  Xi Chen,et al.  Discriminative distance measures for image matching , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[18]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[19]  Otis Gospodnetic,et al.  Lucene in Action (In Action series) , 2004 .

[20]  Carlo Tomasi,et al.  Perceptual metrics for image database navigation , 1999 .

[21]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[22]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[23]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[24]  Andrew Zisserman,et al.  Efficient object retrieval from videos , 2004, 2004 12th European Signal Processing Conference.

[25]  Ioannis Pitas,et al.  Optical flow estimation and moving object segmentation based on median radial basis function network , 1998, IEEE Trans. Image Process..