DeepMPR: Enhancing Opportunistic Routing in Wireless Networks through Multi-Agent Deep Reinforcement Learning

Opportunistic routing relies on the broadcast capability of wireless networks. It brings higher reliability and robustness in highly dynamic and/or severe environments such as mobile or vehicular ad-hoc networks (MANETs/VANETs). To reduce the cost of broadcast, multicast routing schemes use the connected dominating set (CDS) or multi-point relaying (MPR) set to decrease the network overhead and hence, their selection algorithms are critical. Common MPR selection algorithms are heuristic, rely on coordination between nodes, need high computational power for large networks, and are difficult to tune for network uncertainties. In this paper, we use multi-agent deep reinforcement learning to design a novel MPR multicast routing technique, DeepMPR, which is outperforming the OLSR MPR selection algorithm while it does not require MPR announcement messages from the neighbors. Our evaluation results demonstrate the performance gains of our trained DeepMPR multicast forwarding policy compared to other popular techniques.

[1]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[2]  Saeed Kaviani,et al.  DeepCQ+: Robust and Scalable Routing with Multi-Agent Deep Reinforcement Learning for Highly Dynamic Networks , 2021, MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM).

[3]  Wei Yu,et al.  Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer , 2020, IEEE Transactions on Communications.

[4]  Dinesh Manocha,et al.  Parameter Sharing is Surprisingly Useful for Multi-Agent Deep Reinforcement Learning. , 2020 .

[5]  Wilson Naik Bhukya,et al.  An Evolutionary Approach to Multi-point Relays Selection in Mobile Ad Hoc Networks , 2019, PReMI.

[6]  Julong Lan,et al.  DROM: Optimizing the Routing in Software-Defined Networks With Deep Reinforcement Learning , 2018, IEEE Access.

[7]  Renato Lo Cigno,et al.  Where have all the MPRs gone? On the optimal selection of Multi-Point Relays , 2018, Ad Hoc Networks.

[8]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[9]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[10]  Albert Cabellos-Aparicio,et al.  A Deep-Reinforcement Learning Approach for Software-Defined Networking Routing Optimization , 2017, ArXiv.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Joseph P. Macker,et al.  Simplified Multicast Forwarding , 2012, RFC.

[15]  C. Sathitwiriyawong,et al.  A Comparative Study of Random Waypoint and Gauss-Markov Mobility Models in the Performance Evaluation of MANET , 2006, 2006 International Symposium on Communications and Information Technologies.

[16]  Robert Tappan Morris,et al.  ExOR: opportunistic multi-hop routing for wireless networks , 2005, SIGCOMM '05.

[17]  Robert Tappan Morris,et al.  Opportunistic routing in multi-hop wireless networks , 2004, Comput. Commun. Rev..

[18]  Philippe Jacquet,et al.  Optimized Link State Routing Protocol (OLSR) , 2003, RFC.

[19]  Anis Laouiti,et al.  Multipoint relaying for flooding broadcast messages in mobile wireless networks , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[20]  Klara Nahrstedt,et al.  Distributed quality-of-service routing in ad hoc networks , 1999, IEEE J. Sel. Areas Commun..

[21]  Zygmunt J. Haas,et al.  Predictive distance-based mobility management for PCS networks , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[22]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[23]  R. Bellman A Markovian Decision Process , 1957 .

[24]  Dhavy Gantsou,et al.  Revisiting multipoint relay selection in the optimized link state routing protocol , 2009 .

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .