Contextual person detection in multi-modal outdoor surveillance

In this paper we present a new approach to person detection in outdoor surveillance tasks. A multi-modal segmentation (RGB, Polarimetric, thermal sensors) of the world into regions sky, road, bush, trees, grass etc. is used to learn the normal spatial context of people appearing in normal training data. The context feature is a novel application of the work of Wolf et al. [1] which enables the probability of a person appearing in a certain location to be computed. By using motion as a precursor to the deployment of a HOG person detector in conjunction with the spatial context likelihood we obtain significant improvement in person detection for challenging scenes. Comprehensive ROC analysis on 4 outdoor scenes is reported for normal activity detection. Anomaly detection is then achieved using learned context and we show that 72% of true positive anomalies are found for a false positive rate of 0.19% over all data in thermal and visual band data.

[1]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[2]  S. Ullman,et al.  Spatial Context in Recognition , 1996, Perception.

[3]  Bernt Schiele,et al.  A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[4]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[5]  Andrew M. Wallace,et al.  Efficient Resource Allocation using a Multiobjective Utility Optimisation Method , 2008 .

[6]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[7]  C. Tucker Red and photographic infrared linear combinations for monitoring vegetation , 1979 .

[8]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  J. Keller,et al.  Learning spatial relationships in computer vision , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[12]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Neil M. Robertson,et al.  Contextual smoothing of image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[14]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[15]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[16]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Antonio Torralba,et al.  Object Detection and Localization Using Local and Global Features , 2006, Toward Category-Level Object Recognition.