A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles

Abstract Obstacle avoidance is a necessary behavior to guarantee the safety of an unmanned aerial vehicle (UAV). However, it is a challenge for the UAV to detect and avoid high-speed flying obstacles such as other UAVs or birds. In this paper, we propose a generic framework that integrates an autonomous obstacle detection module and a reinforcement learning (RL) module to develop reactive obstacle avoidance behavior for a UAV. In the obstacle detection module, we design a saliency detection algorithm using deep convolution neural networks (CNNs) to extract monocular visual cues. The algorithm imitates human’s visual detection system, and it can accurately estimate the location of obstacles in the field of view (FOV). The RL module uses an actor–critic structure that chooses the RBF neural network to approximate the value function and control policy in continuous state and action spaces. We have tested the effectiveness of the proposed learning framework in a semi-physical experiment. The results show that the proposed saliency detection algorithm performs better than state-of-the-art, and the RL algorithm can learn the avoidance behavior from the manual experiences.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Jurgen Schmidhuber,et al.  Intrinsically motivated neuroevolution for vision-based reinforcement learning , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[3]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[4]  Youmin Zhang,et al.  Sense and avoid technologies with applications to unmanned aircraft systems: Review and prospects , 2015 .

[5]  Koen V. Hindriks,et al.  Active learning of affordances for robot use of household objects , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[6]  Minoru Asada,et al.  Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via Vision-Based Reinforcement Learning and Development , 1999, Artif. Intell..

[7]  Minoru Asada,et al.  Coordination of multiple behaviors acquired by a vision-based reinforcement learning , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[8]  Minoru Asada,et al.  Vision-based reinforcement learning for purposive behavior acquisition , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[9]  Naila Murray,et al.  Saliency estimation using a non-parametric low-level vision model , 2011, CVPR 2011.

[10]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[11]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[12]  Minoru Asada,et al.  Vision-Based Reinforcement Learning for RoboCup : Towards Real Robot Competition , 2004 .

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Jürgen Schmidhuber,et al.  Evolving deep unsupervised convolutional networks for vision-based reinforcement learning , 2014, GECCO.

[15]  Stanislaw Jankowski,et al.  A new bio-inspired decision chain for UAV sense-and-avoid applications , 2012, Other Conferences.

[16]  Hak-Keung Lam,et al.  The Q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments , 2015, Robotics Auton. Syst..

[17]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Brett R. Fajen,et al.  Visual navigation and obstacle avoidance using a steering potential function , 2006, Robotics Auton. Syst..

[20]  Randal W. Beard,et al.  Observability-based local path planning and obstacle avoidance using bearing-only measurements , 2013, Robotics Auton. Syst..

[21]  Bärbel Mertsching,et al.  Tangential Gap Flow (TGF) navigation: A new reactive obstacle avoidance approach for highly cluttered environments , 2016, Robotics Auton. Syst..

[22]  Dario Floreano,et al.  Autonomous flight at low altitude with vision-based collision avoidance and GPS-based path following , 2010, 2010 IEEE International Conference on Robotics and Automation.

[23]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[24]  Katsunari Shibata,et al.  Acquisition of box pushing by direct-vision-based reinforcement learning , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[25]  Marc Pollefeys,et al.  Reactive avoidance using embedded stereo vision for MAV flight , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).