Flycon: real-time environment-independent multi-view human pose estimation with aerial vehicles

We propose a real-time method for the infrastructure-free estimation of articulated human motion. The approach leverages a swarm of camera-equipped flying robots and jointly optimizes the swarm's and skeletal states, which include the 3D joint positions and a set of bones. Our method allows to track the motion of human subjects, for example an athlete, over long time horizons and long distances, in challenging settings and at large scale, where fixed infrastructure approaches are not applicable. The proposed algorithm uses active infra-red markers, runs in real-time and accurately estimates robot and human pose parameters online without the need for accurately calibrated or stationary mounted cameras. Our method i) estimates a global coordinate frame for the MAV swarm, ii) jointly optimizes the human pose and relative camera positions, and iii) estimates the length of the human bones. The entire swarm is then controlled via a model predictive controller to maximize visibility of the subject from multiple viewpoints even under fast motion such as jumping or jogging. We demonstrate our method in a number of difficult scenarios including capture of long locomotion sequences at the scale of a triplex gym, in non-planar terrain, while climbing and in outdoor scenarios.

[1]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[2]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pat Hanrahan,et al.  An interactive tool for designing quadrotor camera shots , 2015, ACM Trans. Graph..

[4]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Lu Fang,et al.  Monocular Long-Term Target Following on UAVs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[7]  Alcherio Martinoli,et al.  Relative localization and communication module for small-scale multi-robot systems , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[8]  Pascal Fua,et al.  Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation , 2016, ArXiv.

[9]  Chien-Shu Hsieh,et al.  Robust two-stage Kalman filters for systems with unknown inputs , 2000, IEEE Trans. Autom. Control..

[10]  Hans-Peter Seidel,et al.  A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  B. Friedland Treatment of bias in recursive filtering , 1969 .

[12]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[13]  Alexander Domahidi,et al.  Real-Time Motion Planning for Aerial Videography With Real-Time With Dynamic Obstacle Avoidance and Viewpoint Optimization , 2017, IEEE Robotics and Automation Letters.

[14]  Taehyun Rhee,et al.  Realtime human motion control with a small number of inertial sensors , 2011, SI3D.

[15]  Edilson de Aguiar,et al.  MARCOnI—ConvNet-Based MARker-Less Motion Capture in Outdoor and Indoor Scenes , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hans-Peter Seidel,et al.  Model-Based Outdoor Performance Capture , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[17]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[18]  Nicolas Petit,et al.  The Navigation and Control technology inside the AR.Drone micro UAV , 2011 .

[19]  Pat Hanrahan,et al.  Generating dynamically feasible trajectories for quadrotor cameras , 2016, ACM Trans. Graph..

[20]  Alexander Domahidi,et al.  Real-time planning for automated multi-view drone cinematography , 2017, ACM Trans. Graph..

[21]  Hans-Peter Seidel,et al.  Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[22]  Sudipta N. Sinha,et al.  Monocular Localization of a moving person onboard a Quadrotor MAV , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Luc Van Gool,et al.  Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bodo Rosenhahn,et al.  Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs , 2017, Comput. Graph. Forum.

[25]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qionghai Dai,et al.  FlyCap: Markerless Motion Capture Using Multiple Autonomous Flying Cameras , 2016, IEEE Transactions on Visualization and Computer Graphics.

[27]  E. J. Lefferts,et al.  Kalman Filtering for Spacecraft Attitude Estimation , 1982 .

[28]  Gaurav S. Sukhatme,et al.  Circumventing dynamic modeling: evaluation of the error-state Kalman filter applied to mobile robot localization , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[29]  Sergei Lupashin,et al.  The Flying Machine Arena as of 2010 , 2011, 2011 IEEE International Conference on Robotics and Automation.

[30]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[31]  Otmar Hilliges,et al.  Airways: Optimization-Based Planning of Quadrotor Trajectories according to High-Level User Goals , 2016, CHI.

[32]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[34]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[35]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[36]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[37]  Otmar Hilliges,et al.  Duo-VIO: Fast, light-weight, stereo inertial odometry , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[39]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  J. A. Castellanos,et al.  Limits to the consistency of EKF-based SLAM , 2004 .

[41]  Dario Floreano,et al.  Audio-based Relative Positioning System for Multiple Micro Air Vehicle Systems , 2013, Robotics: Science and Systems.

[42]  Bruce P. Gibbs,et al.  Advanced Kalman Filtering, Least-Squares and Modeling: A Practical Handbook , 2011 .

[43]  Kwang-Ting Cheng,et al.  Through-the-Lens Drone Filming , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[45]  Pascal Fua,et al.  Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[47]  Nassir Navab,et al.  Discriminative Human Full-Body Pose Estimation from Wearable Inertial Sensor Data , 2009, 3DPH.

[48]  Otmar Hilliges,et al.  Optimizing for aesthetically pleasing quadrotor camera motion , 2018, ACM Trans. Graph..

[49]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[50]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[51]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[52]  Manfred Morari,et al.  Environment-independent formation flight for micro aerial vehicles , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Vijay Kumar,et al.  The GRASP Multiple Micro-UAV Testbed , 2010, IEEE Robotics & Automation Magazine.

[54]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Javier Alonso-Mora,et al.  Distributed multi-robot formation control in dynamic environments , 2018, Auton. Robots.

[56]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[57]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.