论文信息 - Object Detection with Vocabularies of Space-time Descriptors

Object Detection with Vocabularies of Space-time Descriptors

This paper presents a novel framework for objects detection in security and broadcast videos. Our method assumes that object classes are unknown in advance and exploit the temporal-space properties of the videos for the creation of a vocabulary that describes these classes. Local space-time features have recently became a popular video representation for action recognition and object detection. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. In this work we propose the use of different kinds of descriptors for the creation of vocabularies for different detection object task. For a better description of the videos we carry out a background model, tryring to clean up and follow the areas where there are objects. The points of interest in the videos to characterize the objects are calculated with a temporary variant of the famous Harris corner detector. With the descriptors obtained from the points of interest, a vocabulary is realized usingthe kinds of videos we want to train. Then we obtained the frequency histograms between the videos for training and the vocabulary so, with a binary classifier obtain the trained classes and following the same procedure without the vocabulary realized the detection and monitoring of the objects. The new method presented is also compared with a state of the art method, obtaining better results in both accuracy and false object rejection.

[1] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2] Chein-I Chang,et al. Unsupervised approach to color video thresholding , 2004 .

[3] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4] Hanna M. Wallach,et al. Topic modeling: beyond bag-of-words , 2006, ICML.

[5] Pietro Perona,et al. A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[6] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[7] Luc Van Gool,et al. An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[8] Pietro Perona,et al. Unsupervised learning of models for object recognition , 2000 .

[9] Chein-I Chang,et al. An unsupervised approach to color video thresholding , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10] Juergen Gall,et al. Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12] Colin Campbell,et al. Bayes Point Machines , 2001, J. Mach. Learn. Res..

[13] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14] Marti A. Hearst. Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[15] Ivan Laptev,et al. Improving object detection with boosted histograms , 2009, Image Vis. Comput..

[16] Roberto Cipolla,et al. Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.

[19] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.