Human motion analysis from UAV video

Purpose The purpose of this paper is to present a preliminary solution to address the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. Design/methodology/approach The distinguishing feature of the solution is a dynamic classifier selection architecture. Each video frame is corrected for perspective using projective transformation. Then, a silhouette is extracted as a Histogram of Oriented Gradients (HOG). The HOG is then classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. The dynamic classifier consists of a Support Vector Machine (SVM) classifier C64 that recognizes all 64 classes, and 64 SVM classifiers that recognize four classes each – these four classes are chosen based on the temporal relationship between them, dictated by the gait sequence. Findings The solution provides three main advantages: first, classification is efficient due to dynamic selection (4-class vs 64-class classification). Second, classification errors are confined to neighbors of the true viewpoints. This means a wrongly estimated viewpoint is at most an adjacent viewpoint of the true viewpoint, enabling fast recovery from incorrect estimations. Third, the robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Originality/value Experiments conducted on both fronto-parallel videos and aerial videos confirm that the solution can achieve accurate pose and trajectory estimation for these different kinds of videos. For example, the “walking on an 8-shaped path” data set (1,652 frames) can achieve the following estimation accuracies: 85 percent for viewpoints and 98.14 percent for poses.

[1]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[2]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[3]  Josechu J. Guerrero,et al.  Viewpoint Independent Human Motion Analysis in Man-made Environments , 2006, BMVC.

[4]  D. Hatzinakos,et al.  Gait recognition: a challenging signal processing technology for biometric identification , 2005, IEEE Signal Processing Magazine.

[5]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[6]  Davide Anguita,et al.  A Hardware-friendly Support Vector Machine for Embedded Automotive Applications , 2007, 2007 International Joint Conference on Neural Networks.

[7]  Bernt Schiele,et al.  Vision based victim detection from unmanned aerial vehicles , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[9]  Robert T. Collins,et al.  Silhouette-based human identification from body shape and gait , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[10]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[11]  Josechu J. Guerrero,et al.  Exploiting projective geometry for view-invariant monocular human motion analysis in man-made environments , 2014, Comput. Vis. Image Underst..

[12]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[13]  Valiallah Monajjemi,et al.  UAV, do you see me? Establishing mutual attention between an uninstrumented human and an outdoor UAV in flight , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Mubarak Shah,et al.  Human identity recognition in aerial images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Jesús Martínez del Rincón,et al.  A spatio-temporal 2D-models framework for human pose recovery in monocular sequences , 2008, Pattern Recognit..

[17]  Asanka G. Perera,et al.  Remote monitoring of cardiorespiratory signals from a hovering unmanned aerial vehicle , 2017, BioMedical Engineering OnLine.

[18]  Nicolás García-Pedrajas,et al.  Improving multiclass pattern recognition by the combination of two strategies , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[20]  Young-Jun Son,et al.  Vision-Based Target Detection and Localization via a Team of Cooperative UAV and UGVs , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[21]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[22]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[23]  Saeid Nahavandi,et al.  A Review of Vision-Based Gait Recognition Methods for Human Identification , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[24]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Andrew Zisserman,et al.  Multiple View Geometry , 1999 .

[27]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[28]  Giorgio Valentini,et al.  Effectiveness of Error Correcting Output Codes in Multiclass Learning Problems , 2000, Multiple Classifier Systems.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[31]  Giorgio Valentini,et al.  Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines , 2003 .

[32]  Mei-Chen Yeh,et al.  Fast medium-scale multiperson identification in aerial videos , 2015, Multimedia Tools and Applications.

[33]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[35]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[36]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[37]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[38]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Dong Ming,et al.  Infrared gait recognition based on wavelet transform and support vector machine , 2010, Pattern Recognit..

[40]  Carlos Orrite,et al.  Shape matching of partially occluded curves invariant under projective transformation , 2004 .

[41]  D. Huttenlocher,et al.  A unified spatio-temporal articulated model for tracking , 2004, CVPR 2004.

[42]  P. Rudol,et al.  Human Body Detection and Geolocalization for UAV Search and Rescue Missions Using Color and Thermal Imagery , 2008, 2008 IEEE Aerospace Conference.

[43]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .