论文信息 - A spatio-temporal pyramid matching for video retrieval

A spatio-temporal pyramid matching for video retrieval

Highlights? We introduce a content-based video retrieval system for a query video shot. ? The shot boundaries are found using a classifier learnt from a boosting algorithm. ? The similarity of video shots is calculated by spatio-temporal pyramid matching. ? The pyramid-matching kernel includes temporal dimension into the matching schema. ? Experiments using sports and UCF50 shows effectiveness of our method. An efficient video retrieval system is essential to search relevant video contents from a large set of video clips, which typically contain several heterogeneous video clips to match with. In this paper, we introduce a content-based video matching system that finds the most relevant video segments from video database for a given query video clip. Finding relevant video clips is not a trivial task, because objects in a video clip can constantly move over time. To perform this task efficiently, we propose a novel video matching called Spatio-Temporal Pyramid Matching (STPM). Considering features of objects in 2D space and time, STPM recursively divides a video clip into a 3D spatio-temporal pyramidal space and compares the features in different resolutions. In order to improve the retrieval performance, we consider both static and dynamic features of objects. We also provide a sufficient condition in which the matching can get the additional benefit from temporal information. The experimental results show that our STPM performs better than the other video matching methods.

Won Jong Jeon | Sang-Chul Lee | Jaesik Choi | Ziyu Wang

[1] Dong Xu,et al. Near Duplicate Identification With Spatially Aligned Pyramid Matching , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[2] Dong Xu,et al. Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Latifur Khan,et al. Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[5] Meng Wang,et al. Automatic video annotation by semi-supervised learning with kernel density estimation , 2006, MM '06.

[6] Shih-Fu Chang,et al. VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[7] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8] Rainer Lienhart,et al. Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[9] Kristen Grauman,et al. Efficiently searching for similar images , 2010, Commun. ACM.

[10] B. Li,et al. Event detection and summarization in sports video , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[11] Luo Si,et al. Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[12] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14] Donald A. Adjeroh,et al. A Distance Measure for Video Sequences , 1999, Comput. Vis. Image Underst..

[15] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[16] David Windridge,et al. An evaluation of bags-of-words and spatio-temporal shapes for action recognition , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[17] Shiqiang Yang,et al. Motion based event recognition using HMM , 2002, Object recognition supported by user interaction for service robots.

[18] R. Manmatha,et al. Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[19] Adrian Ulges,et al. Content-based Video Tagging for Online Video Portals ∗ , 2007 .

[20] Shih-Fu Chang,et al. Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[21] Adriana Kovashka,et al. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22] Eugenio Di Sciascio,et al. Query by Sketch and Relevance Feedback for Content-Based Image Retrieval over the Web , 1999, J. Vis. Lang. Comput..

[23] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24] Trevor Darrell,et al. The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25] Video Libraries. Proceedings IEEE Workshop on Content-based Access of Image and Video Libraries, (CBAIVL 2001),14 December 2001, Kauai, Hawaii , 2001 .

[26] Shuang Liang,et al. Sketch retrieval and relevance feedback with biased SVM classification , 2008, Pattern Recognit. Lett..

[27] Mei Han,et al. Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[28] Mei Han,et al. Maximum entropy model-based baseball highlight detection and classification , 2004, Comput. Vis. Image Underst..

[29] Beng Chin Ooi,et al. Towards effective indexing for very large video sequence database , 2005, SIGMOD '05.

[30] Nicu Sebe,et al. Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[31] Yinghui Xu,et al. Automatic image tagging as a random walk with priors on the canonical correlation subspace , 2008, MIR '08.

[32] Changsheng Xu,et al. Personalized retrieval of sports video , 2007, MIR '07.

[33] Qi Tian,et al. Fast and robust short video clip search using an index structure , 2004, MIR '04.

[34] Harpreet S. Sawhney,et al. Action video retrieval based on atomic action vocabulary , 2008, MIR '08.

[35] Nobuyuki Yagi,et al. Baseball video indexing using patternization of scenes and hidden Markov model , 2005, IEEE International Conference on Image Processing 2005.

[36] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[37] John S. Boreczky,et al. Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[38] Zi Huang,et al. Statistical summarization of content features for fast near-duplicate video detection , 2007, ACM Multimedia.

[39] Jaana Kekäläinen,et al. IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[40] Alberto Del Bimbo,et al. Automatic video annotation using ontologies extended with visual information , 2005, MULTIMEDIA '05.

[41] Won Jong Jeon,et al. Spatio-temporal pyramid matching for sports videos , 2008, MIR '08.

[42] David G. Lowe,et al. Towards a Computational Model for Object Recognition in IT Cortex , 2000, Biologically Motivated Computer Vision.

[43] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44] Jianping Fan,et al. Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers , 2006, MM '06.

[45] Justin Zobel,et al. Fast video matching with signature alignment , 2003, MIR '03.

[46] Koichi Shinoda,et al. A robust scene recognition system for baseball broadcast using data-driven approach , 2007, CIVR '07.

[47] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[48] Mubarak Shah,et al. Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[49] Qi Tian,et al. A unified framework for semantic shot representation of sports video , 2005, MIR '05.

[50] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Shih-Fu Chang,et al. Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[52] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.