Object detection in surveillance video from dense trajectories

Detecting objects such as humans or vehicles is a central problem in video surveillance. Myriad standard approaches exist for this problem. At their core, approaches consider either the appearance of people, patterns of their motion, or differences from the background. In this paper we build on dense trajectories, a state-of-the-art approach for describing spatio-temporal patterns in video sequences. We demonstrate an application of dense trajectories to object detection in surveillance video, showing that they can be used to both regress estimates of object locations and accurately classify objects.

[1]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[2]  Xiaogang Wang,et al.  Switchable Deep Network for Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  L. Davis,et al.  W 4 S: a Real-time System for Detecting and Tracking People in 2 1 2 D , 1998 .

[10]  Lei Chen,et al.  Deep Structured Models For Group Activity Recognition , 2015, BMVC.

[11]  Yuting Zhang,et al.  Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[13]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[14]  Atsushi Nakazawa,et al.  Human tracking using distributed vision systems , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[15]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[17]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[18]  David Beymer,et al.  A real-time computer vision system for vehicle tracking and traffic surveillance , 1998 .

[19]  Iasonas Kokkinos,et al.  Modeling local and global deformations in Deep Learning: Epitomic convolution, Multiple Instance Learning, and sliding window detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[27]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[28]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[29]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[32]  Xiaogang Wang,et al.  Modeling Mutual Visibility Relationship in Pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Robert T. Collins,et al.  Optimized Pedestrian Detection for Multiple and Occluded People , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[36]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[37]  Xiaogang Wang,et al.  A discriminative deep model for pedestrian detection with occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Armin B. Cremers,et al.  Informed Haar-Like Features Improve Pedestrian Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[41]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[42]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[44]  Peter V. Gehler,et al.  Occlusion Patterns for Object Class Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[46]  Shihong Lao,et al.  A Structural Filter Approach to Human Detection , 2010, ECCV.

[47]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[49]  Aggelos K. Katsaggelos,et al.  Detector Ensemble , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[51]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.