Drone Detection and Pose Estimation Using Relational Graph Networks

With the upsurge in use of Unmanned Aerial Vehicles (UAVs), drone detection and pose estimation by using optical sensors becomes an important research subject in cooperative flight and low-altitude security. The existing technology only obtains the position of the target UAV based on object detection methods. To achieve better adaptability and enhanced cooperative performance, the attitude information of the target drone becomes a key message to understand its state and intention, e.g., the acceleration of quadrotors. At present, most of the object 6D pose estimation algorithms depend on accurate pose annotation or a 3D target model, which costs a lot of human resource and is difficult to apply to non-cooperative targets. To overcome these problems, a quadrotor 6D pose estimation algorithm was proposed in this paper. It was based on keypoints detection (only need keypoints annotation), relational graph network and perspective-n-point (PnP) algorithm, which achieves state-of-the-art performance both in simulation and real scenario. In addition, the inference ability of our relational graph network to the keypoints of four motors was also evaluated. The accuracy and speed were improved significantly compared with the state-of-the-art keypoints detection algorithm.

[1]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Long Quan,et al.  Linear N-Point Camera Pose Determination , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Cemal Aker,et al.  Using deep networks for drone detection , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[6]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[9]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[10]  Ashutosh Saxena,et al.  Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[11]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Quan Quan,et al.  Robust Pose Estimation for Multirotor UAVs Using Off-Board Monocular Vision , 2017, IEEE Transactions on Industrial Electronics.

[13]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[14]  Riadh Hajri UAV to UAV Target Detection and Pose Estimation , 2012 .

[15]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jingxuan Sun,et al.  A Camera-Based Target Detection and Positioning UAV System for Search and Rescue (SAR) Purposes , 2016, Sensors.

[18]  Shiqi Li,et al.  A Robust O(n) Solution to the Perspective-n-Point Problem , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jose Luis Sanchez-Lopez,et al.  A Vision-based Quadrotor Swarm for the participation in the 2013 International Micro Air Vehicle Competition , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[20]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Dacheng Tao,et al.  A Coarse-Fine Network for Keypoint Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Juan M. Corchado,et al.  Detection of Cattle Using Drones and Convolutional Neural Networks , 2018, Sensors.

[23]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[26]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[28]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[29]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Tao Liu,et al.  Monocular-Based 6-Degree of Freedom Pose Estimation Technology for Robotic Intelligent Grasping Systems , 2017, Sensors.

[31]  Konrad Reif,et al.  Stochastic Stability of the Extended Kalman Filter With Intermittent Observations , 2010, IEEE Transactions on Automatic Control.

[32]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[34]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Vincent Lepetit,et al.  A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[38]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[39]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[40]  Dong-Hoon Yi,et al.  A Monocular Vision Sensor-Based Obstacle Detection Algorithm for Autonomous Robots , 2016, Sensors.

[41]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sanjiv Singh,et al.  A cascaded method to detect aircraft in video imagery , 2011, Int. J. Robotics Res..

[43]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Qi Gao,et al.  A New On-Board UAV Pose Estimation System Based on Monocular Camera , 2016, 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).

[45]  Jordi Gonzàlez,et al.  Human Pose Estimation from Monocular Images: A Comprehensive Survey , 2016, Sensors.

[46]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[49]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[50]  P. Fua,et al.  Detecting Flying Objects Using a Single Moving Camera , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[52]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Takeshi Naemura,et al.  Learning Multi-frame Visual Representation for Joint Detection and Tracking of Small Objects , 2017, ArXiv.

[54]  Yann LeCun,et al.  GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations , 2018, ArXiv.

[55]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Yukinori Kobayashi,et al.  UAV pose estimation using IR and RGB cameras , 2017, 2017 IEEE/SICE International Symposium on System Integration (SII).

[57]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[58]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[59]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.