Collaborative Anti-jamming in Cognitive Radio Networks Using Minimax-Q Learning

Cognitive radio is an efficient technique for realization of dynamic spectrum access. Since in the cognitive radio network (CRN) environment, the secondary users (SUs) are susceptible to the random jammers, the security issue of the SU's channel access becomes crucial for the CRN framework. The rapidly varying spectrum dynamics of CRN along with the jammer's actions leads to challenging scenario. Stochastic zero-sum game and Markov decision process (MDP) are generally used to model the scenario concerned. To learn the channel dynamics and the jammer's strategy the SUs use reinforcement learning (RL) algorithms, like Minimax-Q learning. In this paper, we have proposed the multi-agent multi-band collaborative anti-jamming among the SUs to combat single jammer using the Minimax-Q learning algorithm. The SUs collaborate via sharing the policies or episodes. Here, we have shown that the sharing of the learned policies or episodes enhances the learning probability of SUs about the jammer's strategies but reward reduces as the cost of communication increases. Simulation results show improvement in learning probability of SU by using collaborative anti-jamming using Minimax-Q learning over single SU fighting the jammer scenario.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[3]  M.A. Wiering,et al.  Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[4]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5]  Marco Wiering QV(λ)-learning: A New On-policy Reinforcement Learning Algorithm , 2005 .

[6]  Sylvain Sorin,et al.  Stochastic Games and Applications , 2003 .

[7]  K. J. Ray Liu,et al.  An anti-jamming stochastic game for cognitive radio networks , 2011, IEEE Journal on Selected Areas in Communications.

[8]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[9]  Ian F. Akyildiz,et al.  NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey , 2006, Comput. Networks.

[10]  Marco Wiering,et al.  The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[11]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[12]  K. J. Ray Liu,et al.  Cognitive Radio Networking and Security: A Game-Theoretic View , 2010 .

[13]  A. Neyman,et al.  Stochastic games , 1981 .

[14]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[15]  K. J. Ray Liu,et al.  Game theory for cognitive radio networks: An overview , 2010, Comput. Networks.

[16]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[17]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[18]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[19]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .