Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

[1]  Hai Nguyen,et al.  Review of Deep Reinforcement Learning for Robot Manipulation , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).

[2]  Koen V. Hindriks,et al.  A Semantic Framework for Socially Adaptive Agents: Towards strong norm compliance , 2015, AAMAS.

[3]  Hui Guo,et al.  Robust Norm Emergence by Revealing and Reasoning about Context: Socially Intelligent Agents for Enhancing Privacy , 2018, IJCAI.

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Xin Xu,et al.  Dynamic path planning of a mobile robot with improved Q-learning algorithm , 2015, 2015 IEEE International Conference on Information and Automation.

[6]  Martin J. Kollingbaum,et al.  Severity-sensitive norm-governed multi-agent planning , 2017, Autonomous Agents and Multi-Agent Systems.

[7]  W. C. Hoffmann,et al.  Field evaluation of spray drift and environmental impact using an agricultural unmanned aerial vehicle (UAV) sprayer. , 2020, The Science of the total environment.

[8]  Mehdi Dastani,et al.  Programming norm-aware agents , 2012, AAMAS.

[9]  Reinaldo A. C. Bianchi,et al.  Heuristically Accelerated Q-Learning: A New Approach to Speed Up Reinforcement Learning , 2004, SBIA.

[10]  Jan Peters,et al.  Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[11]  Frank Dignum,et al.  Towards socially sophisticated BDI agents , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[12]  Frank Dignum,et al.  From Social Monitoring to Normative Influence , 2001, J. Artif. Soc. Soc. Simul..

[13]  Michael Luck,et al.  Norm-based behaviour modification in BDI agents , 2009, AAMAS.

[14]  Guillaume J. Laurent,et al.  A study of FMQ heuristic in cooperative multi-agent games , 2008, AAMAS 2008.

[15]  J. Jeyaratnam Acute pesticide poisoning: a major global health problem. , 1990, World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mondiales.

[16]  Rui Zhang,et al.  Fair Task Allocation When Cost of Task Is Multidimensional , 2020, Applied Sciences.

[17]  Rada Chirkova,et al.  Coco: Runtime Reasoning about Conflicting Commitments , 2016, IJCAI.

[18]  Lazhar Khriji,et al.  Mobile Robot Navigation Based on Q-Learning Technique , 2011 .

[19]  Jun Tang,et al.  Conflict Detection and Resolution for Civil Aviation: A Literature Survey , 2019, IEEE Aerospace and Electronic Systems Magazine.

[20]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[21]  Nuria Pelechano,et al.  From One to Many: Simulating Groups of Agents with Reinforcement Learning Controllers , 2015, IVA.

[22]  Changyong Pan,et al.  Multi-Agent Deep Reinforcement Learning-Based Cooperative Spectrum Sensing With Upper Confidence Bound Exploration , 2019, IEEE Access.

[23]  Libo Sun,et al.  Crowd Navigation in an Unknown and Dynamic Environment Based on Deep Reinforcement Learning , 2019, IEEE Access.

[24]  Christian Vollmer,et al.  Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[25]  Yubin Lan,et al.  Field evaluation of an unmanned aerial vehicle (UAV) sprayer: effect of spray volume on deposition and the control of pests and disease in wheat. , 2019, Pest management science.

[26]  Huadong Dai,et al.  Efficient Training Techniques for Multi-Agent Reinforcement Learning in Combat Tasks , 2019, IEEE Access.

[27]  Rocco Fazzolari,et al.  A Reinforcement Learning-Based QAM/PSK Symbol Synchronizer , 2019, IEEE Access.

[28]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[29]  Ching-Yao Chan,et al.  Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[30]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[31]  Sascha Ossowski,et al.  Normative Reasoning with an Adaptive Self-interested Agent Model Based on Markov Decision Processes , 2010, IBERAMIA.

[32]  Jun Tang,et al.  A causal encounter model of traffic collision avoidance system operations for safety assessment and advisory optimization in high-density airspace , 2018 .

[33]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[34]  Fernando Fernández,et al.  Multi-agent Reinforcement Learning for Simulating Pedestrian Navigation , 2011, ALA.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[37]  Xianchang Wang,et al.  Task scheduling system for UAV operations in agricultural plant protection environment , 2020 .

[38]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[39]  Ho-fung Leung,et al.  The Dynamics of Reinforcement Social Learning in Cooperative Multiagent Systems , 2013, IJCAI.