Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks

Maximum target coverage by adjusting the orientation of distributed sensors is an important problem in directional sensor networks (DSNs). This problem is challenging as the targets usually move randomly but the coverage range of sensors is limited in angle and distance. Thus, it is required to coordinate sensors to get ideal target coverage with low power consumption, e.g. no missing targets or reducing redundant coverage. To realize this, we propose a Hierarchical Target-oriented Multi-Agent Coordination (HiT-MAC), which decomposes the target coverage problem into two-level tasks: targets assignment by a coordinator and tracking assigned targets by executors. Specifically, the coordinator periodically monitors the environment globally and allocates targets to each executor. In turn, the executor only needs to track its assigned targets. To effectively learn the HiT-MAC by reinforcement learning, we further introduce a bunch of practical methods, including a self-attention module, marginal contribution approximation for the coordinator, goal-conditional observation filter for the executor, etc. Empirical results demonstrate the advantage of HiT-MAC in coverage rate, learning efficiency,and scalability, comparing to baselines. We also conduct an ablative analysis on the effectiveness of the introduced components in the framework.

[1]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[2]  Jing Xu,et al.  Pose-Assisted Multi-Camera Collaboration for Active Object Tracking , 2020, AAAI.

[3]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[4]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[5]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Siyuan Li,et al.  Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards , 2019, NeurIPS.

[7]  Wenhan Luo,et al.  AD-VAT+: An Asymmetric Dueling Mechanism for Learning and Understanding Visual Active Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[9]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[10]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[11]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[12]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[13]  Yizhou Wang,et al.  Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning , 2017, ArXiv.

[14]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[15]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[16]  Abdul Samad Ismail,et al.  A Learning Automata-Based Solution to the Priority-Based Target Coverage Problem in Directional Sensor Networks , 2014, Wirel. Pers. Commun..

[17]  M. Amaç Güvensan,et al.  On coverage issues in directional sensor networks: A survey , 2011, Ad Hoc Networks.

[18]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[19]  Hoong Chuin Lau,et al.  Credit Assignment For Collective Multiagent RL With Global Rewards , 2018, NeurIPS.

[20]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Yunjie Gu,et al.  Shapley Q-value: A Local Reward Approach to Solve Global Reward Games , 2020, AAAI.

[23]  Tianshu Chu,et al.  Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[24]  Shaharuddin Salleh,et al.  A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensing ranges , 2015, Neurocomputing.

[25]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[26]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[27]  L. Shapley A Value for n-person Games , 1988 .

[28]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[29]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Songchen Han,et al.  A heading adjustment method in wireless directional sensor networks , 2018, Comput. Networks.

[31]  Youn-Hee Han,et al.  A Greedy Algorithm for Target Coverage Scheduling in Directional Sensor Networks , 2010, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  Huadong Ma,et al.  Some problems of directional sensor networks , 2007, Int. J. Sens. Networks.

[35]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[36]  Lin Wang,et al.  Local Coverage Optimization Strategy Based on Voronoi for Directional Sensor Networks † , 2016, Sensors.