Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection

We consider the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. We present a preliminary solution whose distinguishing feature is a dynamic classifier selection architecture. In our solution, each video frame is corrected for perspective using projective transformation. Then, two alternative feature sets are used: (i) Histogram of Oriented Gradients (HOG) of the silhouette, (ii) Convolutional Neural Network (CNN) features of the RGB image. The features (HOG or CNN) are classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. Our solution provides three main advantages: (i) Classification is efficient due to dynamic selection (4-class vs. 64-class classification). (ii) Classification errors are confined to neighbors of the true viewpoints. (iii) The robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Experiments conducted on both fronto-parallel videos and aerial videos confirm our solution can achieve accurate pose and trajectory estimation for both scenarios. We found using HOG features provides higher accuracy than using CNN features. For example, applying the HOG-based variant of our scheme to the “walking on a figure 8-shaped path” dataset (1652 frames) achieved estimation accuracies of 99.6% for viewpoints and 96.2% for number of poses.

[1]  Mubarak Shah,et al.  Learning a Deep Model for Human Action Recognition from Novel Viewpoints , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[3]  Tao Xiang,et al.  Gait Recognition by Ranking , 2012, ECCV.

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Mei-Chen Yeh,et al.  Fast medium-scale multiperson identification in aerial videos , 2015, Multimedia Tools and Applications.

[6]  Carlos Orrite-Uruñuela,et al.  Shape matching of partially occluded curves invariant under projective transformation , 2004, Comput. Vis. Image Underst..

[7]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[8]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[9]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[10]  Asanka G. Perera,et al.  Remote monitoring of cardiorespiratory signals from a hovering unmanned aerial vehicle , 2017, BioMedical Engineering OnLine.

[11]  Young-Jun Son,et al.  Vision-Based Target Detection and Localization via a Team of Cooperative UAV and UGVs , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  P. Rudol,et al.  Human Body Detection and Geolocalization for UAV Search and Rescue Missions Using Color and Thermal Imagery , 2008, 2008 IEEE Aerospace Conference.

[13]  Saeid Nahavandi,et al.  A Review of Vision-Based Gait Recognition Methods for Human Identification , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[14]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[15]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[16]  Josechu J. Guerrero,et al.  Exploiting projective geometry for view-invariant monocular human motion analysis in man-made environments , 2014, Comput. Vis. Image Underst..

[17]  Wilbert G. Aguilar,et al.  Pedestrian Detection for UAVs Using Cascade Classifiers with Meanshift , 2017, 2017 IEEE 11th International Conference on Semantic Computing (ICSC).

[18]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[19]  Nicolás García-Pedrajas,et al.  Improving multiclass pattern recognition by the combination of two strategies , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[21]  João M. F. Rodrigues,et al.  A Deep Neural Network Video Framework for Monitoring Elderly Persons , 2016, HCI.

[22]  Valiallah Monajjemi,et al.  UAV, do you see me? Establishing mutual attention between an uninstrumented human and an outdoor UAV in flight , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Philip H. S. Torr,et al.  Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors , 2012, International Journal of Computer Vision.

[24]  Brahim Medjahed,et al.  A Deep Learning Approach for Long Term QoS-Compliant Service Composition , 2017, ICSOC.

[25]  Jaime S. Cardoso,et al.  Multi-source deep transfer learning for cross-sensor biometrics , 2016, Neural Computing and Applications.

[26]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[27]  Mubarak Shah,et al.  Human identity recognition in aerial images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Jesús Martínez del Rincón,et al.  A spatio-temporal 2D-models framework for human pose recovery in monocular sequences , 2008, Pattern Recognit..

[29]  Daniel Cremers,et al.  FollowMe: Person following and gesture recognition with a quadrocopter , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Yi Li,et al.  Generative tracking of 3D human motion in latent space by sequential clonal selection algorithm , 2014, Multimedia Tools and Applications.

[31]  Rajiv Shah,et al.  Applying Deep Learning to Basketball Trajectories , 2016, ArXiv.

[32]  Will Traves,et al.  Perspectives in Projective Geometry. A Guided Tour Through Real and Complex Geometry , 2015, Am. Math. Mon..

[33]  Michael W. Whittle,et al.  Gait Analysis: An Introduction , 1986 .

[34]  Peter H. N. de With,et al.  Automatic video-based human motion analyzer for consumer surveillance system , 2009, IEEE Transactions on Consumer Electronics.

[35]  Josechu J. Guerrero,et al.  Viewpoint Independent Human Motion Analysis in Man-made Environments , 2006, BMVC.

[36]  Seda Kul,et al.  Performance Evaluation of Support Vector Machine and Convolutional Neural Network Algorithms in Real-Time Vehicle Type Classification , 2018, EIDWT.

[37]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[38]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  D. Hatzinakos,et al.  Gait recognition: a challenging signal processing technology for biometric identification , 2005, IEEE Signal Processing Magazine.

[40]  AgarwalAnkur,et al.  Recovering 3D Human Pose from Monocular Images , 2006 .

[41]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Dong Ming,et al.  Infrared gait recognition based on wavelet transform and support vector machine , 2010, Pattern Recognit..

[43]  Shay B. Cohen,et al.  Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.

[44]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[45]  Giorgio Valentini,et al.  Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines , 2003, Formal Pattern Analysis & Applications.

[46]  Marc Chardin An invitation to algebraic geometry , 2004 .

[47]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[49]  Yee Wei Law,et al.  Human motion analysis from UAV video , 2018 .

[50]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[52]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[53]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[54]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[55]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[56]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[57]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[58]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[60]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[61]  Daniel P. Huttenlocher,et al.  A unified spatio-temporal articulated model for tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[62]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[63]  Hongxun Yao,et al.  Strategy for dynamic 3D depth data matching towards robust action retrieval , 2015, Neurocomputing.

[64]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[65]  Sarah Kuester,et al.  An Invitation To Algebraic Geometry , 2016 .

[66]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[67]  Sudipta N. Sinha,et al.  Monocular Localization of a moving person onboard a Quadrotor MAV , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[68]  Giorgio Valentini,et al.  Effectiveness of Error Correcting Output Codes in Multiclass Learning Problems , 2000, Multiple Classifier Systems.

[69]  Jürgen Richter-Gebert,et al.  Perspectives on Projective Geometry , 2011 .

[70]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[72]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Bernt Schiele,et al.  Vision based victim detection from unmanned aerial vehicles , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[74]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[75]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[76]  Margherita Antona,et al.  Universal Access in Human-Computer Interaction. Interaction Techniques and Environments , 2016, Lecture Notes in Computer Science.

[77]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[78]  Yannis Avrithis,et al.  Spatiotemporal Features for Action Recognition and Salient Event Detection , 2011, Cognitive Computation.

[79]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[80]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[81]  Yew-Soon Ong,et al.  Deep transfer learning for classification of time-delayed Gaussian networks , 2015, Signal Process..

[82]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[84]  Robert T. Collins,et al.  Silhouette-based human identification from body shape and gait , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[85]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[86]  Davide Anguita,et al.  A Hardware-friendly Support Vector Machine for Embedded Automotive Applications , 2007, 2007 International Joint Conference on Neural Networks.

[87]  Xiaogang Wang,et al.  Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[88]  Wei Zeng,et al.  Model-Based Human Gait Recognition Via Deterministic Learning , 2013, Cognitive Computation.

[89]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[90]  Bin Luo,et al.  Action-Based Pedestrian Identification via Hierarchical Matching Pursuit and Order Preserving Sparse Coding , 2016, Cognitive Computation.