Energy-Efficient Online Path Planning of Multiple Drones Using Reinforcement Learning

Drones, typically unmanned aerial vehicles (UAVs), have many purposes. However, simultaneous operation of multiple drones is challenging, considering the real-time interactions and the environment; the drone must avoid collision with the other drones or obstacles. The proposed Advanced TD3 model performs energy-efficient path planning at the edge-level drone. We modify the twin-delayed deep deterministic policy gradient (TD3), which is the state-of-the-art policy gradient reinforcement learning. The frame stacking technique considers the continuous action space of the drone to the TD3 model. During the training, we gradually increase the observation range of agents for fast and stable convergence. We train the modified TD3 model through Offline RL to reduce the overhead for the RL model training. Drones mount the converged RL model on their onboard computer. The Advanced TD3 model in the drones selects an energy-efficient path without the overhead of the training process of the RL model in real-time, considering external factors such as wind or another drone. The total energy consumption of drones in flight along with online path planning is approximately 106% of the total energy consumption of drones that follow offline path planning, even though the trained TD3 model does not require complex computations for real-time execution.