Anti-jamming in cognitive radio networks using reinforcement learning algorithms

Cognitive radio technology is a promising approach to enhance the spectrum utilization. As the cognitive radio network (CRN) is prone to random attackers, security becomes an important issue for the successful deployment of CRN. In CRN, the dynamic spectrum characteristics of the channel changes very rapidly and further inclusion of the random jammer makes the scenario even more challenging to model. This particular scenario is modeled using the stochastic zero-sum game and Markov decision process (MDP) framework. The time-varying characteristics of the channel as well as the jammer's random strategy can be learnt by the secondary user using the reinforcement learning (RL) algorithms. In this paper, we have proposed to use the QV and the State-action-reward-state-action (SARSA) RL algorithms in place of the earlier proposed Minimax-Q learning. Though the Minimax-Q learning tries to achieve the optimal solution, but in the scenario of anti-jamming, going for the optimal solution may not be the best solution, as for the anti-jamming maximizing the gain is not an issue. Minimax-Q learning is off-policy and greedy algorithm, whereas the QV and SARSA are on-policy algorithms. QV learning performs even better than SARSA as in QV both Q- as well as V- values of the game are updated. Simulation results are also showing the improvement in learning probability of the secondary user by the use of SARSA and QV learning algorithms compared to Minimax-Q learning algorithm.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  K. J. Ray Liu,et al.  Cognitive Radio Networking and Security: Preface , 2010 .

[3]  Sylvain Sorin,et al.  Stochastic Games and Applications , 2003 .

[4]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[5]  M.A. Wiering,et al.  Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[6]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7]  K. J. Ray Liu,et al.  An anti-jamming stochastic game for cognitive radio networks , 2011, IEEE Journal on Selected Areas in Communications.

[8]  Marco Wiering,et al.  The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[9]  A. Neyman,et al.  Stochastic games , 1981 .

[10]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[11]  K. J. Ray Liu,et al.  Game theory for cognitive radio networks: An overview , 2010, Comput. Networks.

[12]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[13]  Marco Wiering QV(lambda)-learning: A New On-policy Reinforcement Learning Algrithm , 2005 .

[14]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[15]  Ian F. Akyildiz,et al.  NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey , 2006, Comput. Networks.