Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition

Aerial human action recognition is an emerging topic in drone applications. Commercial drone platforms capable of detecting basic human actions such as hand gestures have been developed. However, a limited number of aerial video datasets are available to support increased research into aerial human action analysis. Most of the datasets are confined to indoor scenes or object tracking and many outdoor datasets do not have sufficient human body details to apply state-of-the-art machine learning techniques. To fill this gap and enable research in wider application areas, we present an action recognition dataset recorded in an outdoor setting. A free flying drone was used to record 13 dynamic human actions. The dataset contains 240 high-definition video clips consisting of 66,919 frames. All of the videos were recorded from low-altitude and at low speed to capture the maximum human pose details with relatively high resolution. This dataset should be useful to many research areas, including action recognition, surveillance, situational awareness, and gait analysis. To test the dataset, we evaluated the dataset with a pose-based convolutional neural network (P-CNN) and high-level pose feature (HLPF) descriptors. The overall baseline action recognition accuracy calculated using P-CNN was 75.92%.

[1]  Ezzeddine Zagrouba,et al.  Abnormal behavior recognition for intelligent video surveillance systems: A review , 2018, Expert Syst. Appl..

[2]  Alessia Saggese,et al.  An intelligent flying system for automatic detection of faults in photovoltaic plants , 2019, J. Ambient Intell. Humaniz. Comput..

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cyrill Stachniss,et al.  UAV-based crop and weed classification for smart farming , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  Qinghua Hu,et al.  Vision Meets Drones: A Challenge , 2018, ArXiv.

[7]  Touradj Ebrahimi,et al.  Privacy in mini-drone based video surveillance , 2015, ICIP.

[8]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[9]  Qi Tian,et al.  The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking , 2018, ECCV.

[10]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Kuan-Ta Chen,et al.  DroneFace: An Open Dataset for Drone Research , 2017, MMSys.

[12]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[14]  Antonio Torralba,et al.  Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Alessia Saggese,et al.  Multi-Object Tracking by Flying Cameras Based on a Forward-Backward Interaction , 2018, IEEE Access.

[16]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[17]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[18]  Cordelia Schmid,et al.  Mixing Body-Part Sequences for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Arturo de la Escalera,et al.  VBII-UAV: Vision-Based Infrastructure Inspection-UAV , 2017, WorldCIST.

[20]  P. Rudol,et al.  Human Body Detection and Geolocalization for UAV Search and Rescue Missions Using Color and Thermal Imagery , 2008, 2008 IEEE Aerospace Conference.

[21]  Luigi Cinque,et al.  A UAV Video Dataset for Mosaicking and Change Detection From Low-Altitude Flights , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[22]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[23]  Weiyu Zhang,et al.  From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[25]  Lutz Eckstein,et al.  The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[26]  Jayme Garcia Arnal Barbedo,et al.  A Review on the Use of Unmanned Aerial Vehicles and Imaging Sensors for Monitoring and Assessing Plant Stresses , 2019, Drones.

[27]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[28]  Javaan Chahl,et al.  Unmanned Aerial Systems (UAS) Research Opportunities , 2015 .

[29]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[30]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[32]  Richa Singh,et al.  DroneSURF: Benchmark Dataset for Drone-based Face Recognition , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[33]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[34]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[37]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[39]  Ian F. Akyildiz,et al.  Help from the Sky: Leveraging UAVs for Disaster Management , 2017, IEEE Pervasive Computing.

[40]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[41]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[42]  Yee Wei Law,et al.  UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition , 2018, ECCV Workshops.

[43]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[44]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[46]  Samuel Murray,et al.  Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Mei-Chen Yeh,et al.  Fast medium-scale multiperson identification in aerial videos , 2015, Multimedia Tools and Applications.

[48]  Asanka G. Perera,et al.  Remote monitoring of cardiorespiratory signals from a hovering unmanned aerial vehicle , 2017, BioMedical Engineering OnLine.

[49]  Arnaldo de Albuquerque Araújo,et al.  Violence Detection in Video Using Spatio-Temporal Features , 2010, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images.

[50]  Mubarak Shah,et al.  Human identity recognition in aerial images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Selma Sabanovic,et al.  Forecasting Hand Gestures for Human-Drone Interaction , 2018, HRI.

[53]  Robin R. Murphy,et al.  On the Human–Machine Interaction of Unmanned Aerial System Mission Specialists , 2013, IEEE Transactions on Human-Machine Systems.

[54]  Yee Wei Law,et al.  Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection , 2018, Cognitive Computation.

[55]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Truong-Huy D. Nguyen,et al.  Hand Gesture Controlled Drones: An Open Source Library , 2018, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[57]  Hang Zhao,et al.  HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  J. Gonçalves,et al.  UAV photogrammetry for topographic monitoring of coastal areas , 2015 .