Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target Detection

In this paper, we study a joint detection, mapping and navigation problem for a single unmanned aerial vehicle (UAV) equipped with a low complexity radar and flying in an unknown environment. The goal is to optimize its trajectory with the purpose of maximizing the mapping accuracy and, at the same time, to avoid areas where measurements might not be sufficiently informative from the perspective of a target detection. This problem is formulated as a Markov decision process (MDP) where the UAV is an agent that runs either a state estimator for target detection and for environment mapping, and a reinforcement learning (RL) algorithm to infer its own policy of navigation (i.e., the control law). Numerical results show the feasibility of the proposed idea, highlighting the UAV's capability of autonomously exploring areas with high probability of target detection while reconstructing the surrounding environment.

[1]  Petar M. Djuric,et al.  Dynamic Radar Network of UAVs: A Joint Navigation and Tracking Approach , 2020, IEEE Access.

[2]  Wei Jiang,et al.  End-to-end Learning of Waveform Generation and Detection for Radar Systems , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[3]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[4]  Davide Dardari,et al.  Personal Mobile Radars with Millimeter-Wave Massive Arrays for Indoor Mapping , 2016, IEEE Transactions on Mobile Computing.

[5]  Xiao Liu,et al.  Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design , 2019, IEEE Transactions on Vehicular Technology.

[6]  Patrick Benavidez,et al.  Mobile robot navigation and target tracking system , 2011, 2011 6th International Conference on System of Systems Engineering.

[7]  Davide Dardari,et al.  Single-Anchor Localization and Orientation Performance Limits Using Massive Arrays: MIMO vs. Beamforming , 2017, IEEE Transactions on Wireless Communications.

[8]  Li Wang,et al.  Reinforcement learning-based waveform optimization for MIMO multi-target detection , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[9]  Debarati Sen,et al.  An Inclusive Survey on Array Antenna Design for Millimeter-Wave Communications , 2019, IEEE Access.

[10]  Roland Siegwart,et al.  The SHERPA project: Smart collaboration between humans and ground-aerial robots for improving rescuing activities in alpine environments , 2012, 2012 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[11]  David Gesbert,et al.  UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning , 2020, ArXiv.

[12]  Petar M. Djuric,et al.  Dynamic Radar Networks of UAVs , 2019 .

[13]  Davide Dardari,et al.  Occupancy Grid Mapping for Personal Radar Applications , 2018, 2018 IEEE Statistical Signal Processing Workshop (SSP).

[14]  Ismail Güvenç,et al.  Detection, Tracking, and Interdiction for Amateur Drones , 2018, IEEE Communications Magazine.

[15]  Ismail Guvenc,et al.  Improved Throughput Coverage in Natural Disasters: Unmanned Aerial Base Stations for Public-Safety Communications , 2016, IEEE Vehicular Technology Magazine.

[16]  Ismail Güvenç,et al.  Indoor UAV Navigation to a Rayleigh Fading Source Using Q-Learning , 2017, ArXiv.

[17]  Yuan Shen,et al.  Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[18]  Sebastian Thrun,et al.  Learning Occupancy Grid Maps with Forward Sensor Models , 2003, Auton. Robots.

[19]  Edwin K. P. Chong,et al.  UAV Path Planning in a Dynamic Environment via Partially Observable Markov Decision Process , 2013, IEEE Transactions on Aerospace and Electronic Systems.

[20]  Markus Schartel,et al.  Radar Taking Off: New Capabilities for UAVs , 2018, IEEE Microwave Magazine.

[21]  Sebastian Thrun,et al.  Learning occupancy grids with forward models , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[22]  R. Michael Buehrer,et al.  On the use of Markov Decision Processes in cognitive radar: An application to target tracking , 2018, 2018 IEEE Radar Conference (RadarConf18).

[23]  Andrea Giorgetti,et al.  Effects of Noise Power Estimation on Energy Detection for Cognitive Radio Applications , 2011, IEEE Transactions on Communications.

[24]  Paul de Kerret,et al.  Trajectory Optimization for Autonomous Flying Base Station via Reinforcement Learning , 2018, 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Farnaz Abtahi,et al.  Information theoretic reward shaping for curiosity driven learning in POMDPs , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[27]  Vidit Saxena,et al.  Optimal UAV Base Station Trajectories Using Flow-Level Models for Reinforcement Learning , 2019, IEEE Transactions on Cognitive Communications and Networking.

[28]  Hung M. La,et al.  Autonomous UAV Navigation Using Reinforcement Learning , 2018, ArXiv.

[29]  Mahmood R. Azimi-Sadjadi,et al.  Bayesian Learning of Occupancy Grids , 2019 .

[30]  Imad Jawhar,et al.  UAVs for smart cities: Opportunities and challenges , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[31]  M. Huchard,et al.  Wideband Linearly Polarized Transmitarray Antenna for 60 GHz Backhauling , 2017, IEEE Transactions on Antennas and Propagation.

[32]  Nenad Vukmirović,et al.  Direct Wideband Coherent Localization by Distributed Antenna Arrays , 2019, Sensors.

[33]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[34]  R. Malhotra,et al.  Learning sensor-detection policies , 1997, Proceedings of the IEEE 1997 National Aerospace and Electronics Conference. NAECON 1997.

[35]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[36]  Henk Wymeersch,et al.  Position and Orientation Estimation Through Millimeter-Wave MIMO in 5G Systems , 2017, IEEE Transactions on Wireless Communications.

[37]  Christian Waldschmidt,et al.  77 GHz radar-based altimeter for unmanned aerial vehicles , 2018, 2018 IEEE Radio and Wireless Symposium (RWS).

[38]  Davide Dardari,et al.  Joint Energy Detection and Massive Array Design for Localization and Mapping , 2017, IEEE Transactions on Wireless Communications.

[39]  J. I. Mararm,et al.  Energy Detection of Unknown Deterministic Signals , 2022 .