QoS Routing in MANETS with Imprecise Information Using Actor-Critic Reinforcement Learning

This paper proposes a path discovery scheme which supports delay-constrained least cost routing in MANETs. The aim of the scheme is to maximise the probability of success in finding feasible paths while maintaining communication overhead under control in presence of information uncertainty. The problem is viewed as a partially observable Markov decision process (POMDP) and is solved using an actor-critic reinforcement learning (RL) method. The scheme relies on approximate belief states of the environment which captures the network state uncertainty. Numerical results carried out under various scenarios of state uncertainty and stringent QoS requirements show that the proposed RL framework can lead to more efficient control of search messages, i.e., a reduction of up to 63% of average number of search messages with marginal reduction of up to 3 % in success ratio in comparison with a flooding scheme.

[1]  Pau-Lo Hsu,et al.  A cooperative policy for conflict resolution to multi-agent exploration , 2010 .

[2]  S. Tatsumi,et al.  Application of multiagent reinforcement learning - to multicast routing in wireless ad hoc networks ensuring resource reservation , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[3]  Ivan Stojmenovic,et al.  Mobile Ad Hoc Networking: Basagni/Ad Hoc Networking , 2004 .

[4]  Dharma P. Agrawal,et al.  Mobile Ad hoc Networking , 2002 .

[5]  Marco Conti,et al.  Mobile Ad-hoc Networking - Minitrack Introduction , 2001, HICSS.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Wipawee Usaha Resource allocation in networks with dynamic topology , 2004 .

[8]  Leonid Peshkin,et al.  Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[9]  R.K. Shyamasundar,et al.  S-MECRA: a secure energy-efficient routing protocol for wireless ad hoc networks , 2004, IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall. 2004.

[10]  Chai-Keong Toh Maximum battery life routing to support ubiquitous mobile computing in wireless ad hoc networks , 2001 .

[11]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[12]  Ram Ramanathan,et al.  Hierarchically‐organized, multihop mobile wireless networks for quality‐of‐service support , 1998, Mob. Networks Appl..

[13]  Javier A. Barria,et al.  A reinforcement learning ticket-based probing path discovery scheme for MANETs , 2004, Ad Hoc Networks.

[14]  Ivan Stojmenovic,et al.  Ad hoc Networking , 2004 .

[15]  Natarajan Meghanathan On-Demand Maximum Battery Life Routing with Power Sensitive Power Control in Ad Hoc Networks , 2006, International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL'06).

[16]  Erol Gelenbe,et al.  Self-aware networks and QoS , 2004, Proceedings of the IEEE.

[17]  Lex Weaver,et al.  A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[18]  Klara Nahrstedt,et al.  Routing Support for Providing Guaranteed End-to-End Quality-of-Service , 1999 .

[19]  Klara Nahrstedt,et al.  Cross-Layer Design for Data Accessibility in Mobile Ad Hoc Networks , 2002, Wirel. Pers. Commun..

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Klara Nahrstedt,et al.  Distributed quality-of-service routing in ad hoc networks , 1999, IEEE J. Sel. Areas Commun..