Automatic human action recognition in a scene from visual inputs

Surveillance is normally performed by humans, since it requires visual intelligence. However, this can be dull and dangerous, especially for military operations. Therefore, unmanned autonomous visual-intelligence systems are desired. In this paper, we present a novel system that can recognize human actions, which are relevant to detect operationally significant activity. Central to the system is a break-down of high-level perceptual concepts (verbs) in simpler observable events. The system is trained on 3482 videos and evaluated on 2589 videos from the DARPA Mind's Eye program, with for each video human annotations indicating the presence or absence of 48 different actions. The results show that our system reaches good performance approaching the human average response.

[1]  Klamer Schutte,et al.  Likelihood-based object detection and object tracking using color histograms and EM , 2002, Proceedings. International Conference on Image Processing.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Badrinath Roysam,et al.  Image change detection algorithms: a systematic survey , 2005, IEEE Transactions on Image Processing.

[4]  Artur S. d'Avila Garcez,et al.  A Neural-Symbolic Cognitive Agent for Online Learning and Reasoning , 2011, IJCAI.

[5]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[6]  Thomas S. Huang,et al.  Image processing , 1971 .

[7]  Jan-Willem Marck,et al.  Reasoning About Threats: From Observables to Situation Assessment , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[9]  Klamer Schutte,et al.  Probabilistic classification between foreground objects and background , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[10]  Ivan Laptev,et al.  Improving object detection with boosted histograms , 2009, Image Vis. Comput..

[11]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[12]  Gertjan J. Burghouts,et al.  Increasing the security at vital infrastructures: automated detection of deviant behaviors , 2011, Defense + Commercial Sensing.

[13]  Klamer Schutte,et al.  Recognition of 48 Human Behaviors from Video , 2012 .

[14]  Leo de Penning Visual Intelligence using Neural-Symbolic Learning and Reasoning , 2011, NeSy.

[15]  Maarten Ditzel,et al.  System design for distributed adaptive observation systems , 2011, 14th International Conference on Information Fusion.

[16]  Maarten Ditzel,et al.  Situation and threat assessment for urban scenarios in a distributed adaptive system , 2011, 14th International Conference on Information Fusion.

[17]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Marcel Worring,et al.  Re-identification of persons in multi-camera surveillance under varying viewpoints and illumination , 2012, Defense + Commercial Sensing.

[21]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Marcel Worring,et al.  Tracking individuals in surveillance video of a high-density crowd , 2012, Defense + Commercial Sensing.

[23]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.