Focus-of-attention for human activity recognition from UAVs

This paper presents a system to extract metadata about human activities from full-motion video recorded from a UAV. The pipeline consists of these components: tracking, motion features, representation of the tracks in terms of their motion features, and classification of each track as one of the human activities of interest. We consider these activities: walk, run, throw, dig, wave. Our contribution is that we show how a robust system can be constructed for human activity recognition from UAVs, and that focus-of-attention is needed. We find that tracking and human detection are essential for robust human activity recognition from UAVs. Without tracking, the human activity recognition deteriorates. The combination of tracking and human detection is needed to focus the attention on the relevant tracks. The best performing system includes tracking, human detection and a per-track analysis of the five human activities. This system achieves an average accuracy of 93%. A graphical user interface is proposed to aid the operator or analyst during the task of retrieving the relevant parts of video that contain particular human activities. Our demo is available on YouTube.

[1]  Klamer Schutte,et al.  Spatio-temporal layout of human actions for improved bag-of-words action detection , 2013, Pattern Recognit. Lett..

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Gérard G. Medioni,et al.  Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[6]  R.P. Higgins,et al.  Automatic event recognition for enhanced situational awareness in UAV video , 2005, MILCOM 2005 - 2005 IEEE Military Communications Conference.

[7]  Klamer Schutte,et al.  Activity recognition and localization on a truck parking lot , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[8]  Rogério Schmidt Feris,et al.  Benchmarking Datasets for Human Activity Recognition , 2011, Visual Analysis of Humans.

[9]  Jianjiang Lu,et al.  A Framework for Moving Target Detection, Recognition and Tracking in UAV Videos , 2012 .

[10]  Marcel Worring,et al.  Re-identification of persons in multi-camera surveillance under varying viewpoints and illumination , 2012, Defense + Commercial Sensing.

[11]  Patrick Doherty,et al.  From images to traffic behavior - A UAV tracking and monitoring application , 2007, 2007 10th International Conference on Information Fusion.

[12]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[13]  Judith Dijk,et al.  Detection and tracking of humans from an airborne platform , 2014, Security and Defence.

[14]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[15]  Henri Bouma,et al.  Recognition and localization of relevant human behavior in videos , 2013, Defense, Security, and Sensing.

[16]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[17]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Xin Li,et al.  Simultaneous Video Stabilization and Moving Object Detection in Turbulence , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[20]  Prithviraj Dasgupta,et al.  A Multiagent Swarming System for Distributed Automatic Target Recognition Using Unmanned Aerial Vehicles , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[21]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[23]  Klamer Schutte,et al.  Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos , 2013, Machine Vision and Applications.

[24]  Klamer Schutte,et al.  Instantaneous threat detection based on a semantic representation of activities, zones and trajectories , 2014, Signal, Image and Video Processing.

[25]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.