UAV Autonomous Target Search Based on Deep Reinforcement Learning in Complex Disaster Scene

In recent years, artificial intelligence has played an increasingly important role in the field of automated control of drones. After AlphaGo used Intensive Learning to defeat the World Go Championship, intensive learning gained widespread attention. However, most of the existing reinforcement learning is applied in games with only two or three moving directions. This paper proves that deep reinforcement learning can be successfully applied to an ancient puzzle game Nokia Snake after further processing. A game with four directions of movement. Through deep intensive learning and training, the Snake (or self-learning Snake) learns to find the target path autonomously, and the average score on the Snake Game exceeds the average score on human level. This kind of Snake algorithm that can find the target path autonomously has broad prospects in the industrial field, such as: UAV oil and gas field inspection, Use drones to search for and rescue injured people after a complex disaster. As we all know, post-disaster relief requires careful staffing and material dispatch. There are many factors that need to be considered in the artificial planning of disaster relief. Therefore, we want to design a drone that can search and rescue personnel and dispatch materials. Current drones are quite mature in terms of automation control, but current drones require manual control. Therefore, the Snake algorithm proposed here to be able to find the target path autonomously is an attempt and key technology in the design of autonomous search and rescue personnel and material dispatching drones.

[1]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[2]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[3]  Wei Zeng,et al.  Adapting Markov Decision Process for Search Result Diversification , 2017, SIGIR.

[4]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Kostas Daniilidis,et al.  Active end-effector pose selection for tactile object recognition through Monte Carlo tree search , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Linhua Jiang,et al.  Super resolution reconstruction algorithm of video image based on deep self encoding learning , 2018, Multimedia Tools and Applications.

[7]  Naixue Xiong,et al.  Vegetation Greening for Winter Oblique Photography Using Cycle-Consistence Adversarial Networks , 2018, Symmetry.

[8]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[10]  Holger Hermanns,et al.  Multi-objective Robust Strategy Synthesis for Interval Markov Decision Processes , 2017, QEST.

[11]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[12]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[13]  Yifan Zhang,et al.  An Automatically Learning and Discovering Human Fishing Behaviors Scheme for CPSCN , 2018, IEEE Access.

[14]  Koray Kavukcuoglu,et al.  Combining policy gradient and Q-learning , 2016, ICLR.

[15]  Vicenç Gómez,et al.  A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.

[16]  Yang Xiao,et al.  Energy-efficient node scheduling algorithms for wireless sensor networks using Markov Random Field model , 2016, Inf. Sci..

[17]  Mickael Randour,et al.  Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes , 2017, ICALP.

[18]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[19]  Christian Bettstetter,et al.  Multi-objective UAV path planning for search and rescue , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Dennis J. N. J. Soemers,et al.  Enhancements for real-time Monte-Carlo Tree Search in General Video Game Playing , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[21]  ANIL KUMAR YADAV,et al.  AI-based adaptive control and design of autopilot system for nonlinear UAV , 2014 .

[22]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[23]  Honglak Lee,et al.  Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.

[24]  Hao Jiang,et al.  Online Path Planning of Autonomous UAVs for Bearing-Only Standoff Multi-Target Following in Threat Environment , 2018, IEEE Access.

[25]  Joseph Buongiorno,et al.  Multicriteria Forest Decisionmaking under Risk with Goal-Programming Markov Decision Process Models , 2017 .

[26]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Wen Gao,et al.  Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation , 2017, AAAI.

[28]  Lei Yu,et al.  Data Fusion-Based Multi-Object Tracking for Unconstrained Visual Sensor Networks , 2018, IEEE Access.

[29]  Ruck Thawonmas,et al.  Applying and Improving Monte-Carlo Tree Search in a Fighting Game AI , 2016, ACE.

[30]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[31]  Na Liu,et al.  Fine-Grained Age Estimation in the Wild With Attention LSTM Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Kyungjae Lee,et al.  Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[33]  H. Pham,et al.  Bellman equation and viscosity solutions for mean-field stochastic control problem , 2015, 1512.07866.

[34]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[35]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[36]  Youmin Zhang,et al.  Collision-Free Trajectory Generation and Tracking for UAVs Using Markov Decision Process in a Cluttered Environment , 2019, J. Intell. Robotic Syst..

[37]  Koray Kavukcuoglu,et al.  PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.