Online learning in autonomic multi-hop wireless networks for transmitting mission-critical applications

In this paper, we study how to optimize the transmission decisions of nodes aimed at supporting mission-critical applications, such as surveillance, security monitoring, and military operations, etc. We focus on a network scenario where multiple source nodes transmit simultaneously mission-critical data through relay nodes to one or multiple destinations in multi-hop wireless Mission-Critical Networks (MCN). In such a network, the wireless nodes can be modeled as agents that can acquire local information from their neighbors and, based on this available information, can make timely transmission decisions to minimize the end-to-end delays of the mission-critical applications. Importantly, the MCN needs to cope in practice with the time-varying network dynamics. Hence, the agents need to make transmission decisions by considering not only the current network status, but also how the network status evolves over time, and how this is influenced by the actions taken by the nodes. We formulate the agents' autonomic decision making problem as a Markov decision process (MDP) and construct a distributed MDP framework, which takes into consideration the informationally-decentralized nature of the multi-hop MCN. We further propose an online model-based reinforcement learning approach for agents to solve the distributed MDP at runtime, by modeling the network dynamics using priority queuing. We compare the proposed model-based reinforcement learning approach with other model-free reinforcement learning approaches in the MCN. The results show that the proposed model-based reinforcement learning approach for mission-critical applications not only outperforms myopic approaches without learning capability, but also outperforms conventional model-free reinforcement learning approaches.

[1]  Pascal Frossard,et al.  Rate-distortion optimized distributed packet scheduling of multiple video streams over shared communication resources , 2006, IEEE Trans. Multim..

[2]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[3]  Richard Tynan,et al.  Autonomic wireless sensor networks , 2004, Eng. Appl. Artif. Intell..

[4]  Mihaela van der Schaar,et al.  Informationally Decentralized Video Streaming Over Multihop Wireless Networks , 2007, IEEE Transactions on Multimedia.

[5]  Tim Roughgarden,et al.  How bad is selfish routing? , 2002, JACM.

[6]  A. Leon-Garcia,et al.  On congestion in mission critical networks , 2008, IEEE INFOCOM Workshops 2008.

[7]  T. Javidi,et al.  Towards Throughput and Delay Optimal Routing for Wireless Ad-Hoc Networks , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[8]  Ying Huang,et al.  DoS-resistant broadcast authentication protocol with low end-to-end delay , 2008, IEEE INFOCOM Workshops 2008.

[9]  Jim Dowling,et al.  Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[10]  Mohamed Eltoweissy,et al.  Ad Hoc and Sensor Networks , 2005 .

[11]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[12]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  Srihari Nelakuditi,et al.  Adaptive proportional routing: a localized QoS routing approach , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[17]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[18]  Andrea J. Goldsmith,et al.  Capacity regions for wireless ad hoc networks , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).

[19]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[20]  Abdelmounaam Rezgui,et al.  Service-Oriented Sensor-Actuator Networks , 2007 .

[21]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[22]  Prasad Tadepalli,et al.  Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..

[23]  Saleem A. Kassam,et al.  Finite-state Markov model for Rayleigh fading channels , 1999, IEEE Trans. Commun..

[24]  Mung Chiang,et al.  Link-State Routing with Hop-by-Hop Forwarding Can Achieve Optimal Traffic Engineering , 2008, INFOCOM.

[25]  Riccardo Bettati,et al.  IEEE TRANSACTIONS ON SYSTEMS , MAN , AND CYBERNETICS — PART A : SYSTEMS AND HUMANS , 2001 .

[26]  Timothy X. Brown,et al.  Adaptive call admission control under quality of service constraints: a reinforcement learning solution , 2000, IEEE Journal on Selected Areas in Communications.

[27]  Charles E. Perkins,et al.  Highly dynamic Destination-Sequenced Distance-Vector routing (DSDV) for mobile computers , 1994, SIGCOMM.

[28]  Umesh Sehgal,et al.  AUTONOMIC WIRELESS SENSOR NETWORKS , 2012 .

[29]  Eytan Modiano,et al.  Dynamic power allocation and routing for time varying wireless networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[30]  Matthew Liotine Mission-Critical Network Planning , 2003 .

[31]  Pramod K. Varshney,et al.  QoS Support in Wireless Sensor Networks: A Survey , 2004, International Conference on Wireless Networks.

[32]  Mihaela van der Schaar,et al.  A systematic framework for dynamically optimizing multi-user wireless video transmission , 2009, IEEE Journal on Selected Areas in Communications.

[33]  Mihaela van der Schaar,et al.  Multi-user video streaming over multi-hop wireless networks: A distributed, cross-layer approach based on priority queuing , 2007, IEEE Journal on Selected Areas in Communications.

[34]  Mohamed F. Younis,et al.  An energy-aware QoS routing protocol for wireless sensor networks , 2003, 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings..