Time-Varying Formation Controllers for Unmanned Aerial Vehicles Using Deep Reinforcement Learning

We consider the problem of designing scalable and portable controllers for unmanned aerial vehicles (UAVs) to reach time-varying formations as quickly as possible. This brief confirms that deep reinforcement learning can be used in a multi-agent fashion to drive UAVs to reach any formation while taking into account optimality and portability. We use a deep neural network to estimate how good a state is, so the agent can choose actions accordingly. The system is tested with different non-high-dimensional sensory inputs without any change in the neural network architecture, algorithm or hyperparameters, just with additional training.

[1]  Nikhil Nigam,et al.  Control of Multiple UAVs for Persistent Surveillance: Algorithm and Flight Test Results , 2012, IEEE Transactions on Control Systems Technology.

[2]  Hasan Mehrjerdi,et al.  A survey on multiple unmanned vehicles formation control and coordination: Normal and fault situations , 2013, 2013 International Conference on Unmanned Aircraft Systems (ICUAS).

[3]  Wang Rui,et al.  Adaptive time-varying formation control for high-order LTI multi-agent systems , 2015, 2015 34th Chinese Control Conference (CCC).

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[6]  Hai Lin,et al.  Hybrid three-dimensional formation control for unmanned helicopters , 2013, Autom..

[7]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[8]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[9]  Sonia Waharte,et al.  Coordinated Search with a Swarm of UAVs , 2009, 2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[12]  Yisheng Zhong,et al.  Time-Varying Formation Control for Unmanned Aerial Vehicles: Theories and Applications , 2015, IEEE Transactions on Control Systems Technology.