Dynamic value iteration networks for the planning of rapidly changing UAV swarms

In an unmanned aerial vehicle ad-hoc network (UANET), sparse and rapidly mobile unmanned aerial vehicles (UAVs)/nodes can dynamically change the UANET topology. This may lead to UANET service performance issues. In this study, for planning rapidly changing UAV swarms, we propose a dynamic value iteration network (DVIN) model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function, which enables UAVs/nodes to adapt to novel physical locations. We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method. Simulation results demonstrate that the proposed model significantly reduces the decision-making time for UAV/node path planning with a high average success rate.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[5]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[6]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[7]  Jaesung Lee,et al.  Fast genetic algorithm for robot path planning , 2013 .

[8]  Ismail Guvenc,et al.  Receding Horizon Multi-UAV Cooperative Tracking of Moving RF Source , 2017, IEEE Communications Letters.

[9]  Vincent Roberge,et al.  Comparison of Parallel Genetic Algorithm and Particle Swarm Optimization for Real-Time UAV Path Planning , 2013, IEEE Transactions on Industrial Informatics.

[10]  Ping Li,et al.  Current trends in the development of intelligent unmanned autonomous systems , 2017, Frontiers of Information Technology & Electronic Engineering.

[11]  Ilker Bekmezci,et al.  Flying Ad-Hoc Networks (FANETs): A survey , 2013, Ad Hoc Networks.

[12]  R. Bellman Dynamic programming. , 1957, Science.

[13]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[16]  R. Bellman Dynamic Programming , 1957, Science.