论文信息 - Competing Cognitive Resilient Networks

Competing Cognitive Resilient Networks

We introduce competing cognitive resilient network (CCRN) of mobile radios challenged to optimize data throughput and networking efficiency under dynamic spectrum access and adversarial threats (e.g., jamming). Unlike the conventional approaches, CCRN features both communicator and jamming nodes in a friendly coalition to take joint actions against hostile networking entities. In particular, this paper showcases hypothetical blue force and red force CCRNs and their competition for open spectrum resources. We present state-agnostic and stateful solution approaches based on the decision theoretic framework. The state-agnostic approach builds on multiarmed bandit to develop an optimal strategy that enables the exploratory-exploitative actions from sequential sampling of channel rewards. The stateful approach makes an explicit model of states and actions from an underlying Markov decision process and uses multiagent Q-learning to compute optimal node actions. We provide a theoretical framework for CCRN and propose new algorithms for both approaches. Simulation results indicate that the proposed algorithms outperform some of the most important algorithms known to date.

[1] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .

[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[3] Brian M. Sadler,et al. A Survey of Dynamic Spectrum Access , 2007, IEEE Signal Processing Magazine.

[4] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[7] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] B. Gnedenko. Sur La Distribution Limite Du Terme Maximum D'Une Serie Aleatoire , 1943 .

[10] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[11] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[12] H. Vincent Poor,et al. Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[13] K. J. Ray Liu,et al. An anti-jamming stochastic game for cognitive radio networks , 2011, IEEE Journal on Selected Areas in Communications.

[14] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[16] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[17] Rami G. Melhem,et al. Modeling of the channel-hopping anti-jamming defense in multi-radio wireless networks , 2008, MobiQuitous.

[18] H. T. Kung,et al. Competing Mobile Network Game: Embracing antijamming and jamming strategies with reinforcement learning , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[19] H. T. Kung,et al. Optimizing media access strategy for competing cognitive radio networks , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[20] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[21] A. F. Smith,et al. Conjugate likelihood distributions , 1993 .

[22] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[23] Miroslav Pajic,et al. Anti-jamming for embedded wireless networks , 2009, 2009 International Conference on Information Processing in Sensor Networks.

[24] Ronald L. Rivest,et al. Simulation results for a new two-armed bandit heuristic , 1994, Annual Conference Computational Learning Theory.

[25] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[26] Radha Poovendran,et al. Optimal Jamming Attack Strategies and Network Defense Policies in Wireless Sensor Networks , 2010, IEEE Transactions on Mobile Computing.

[27] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[28] Wenyuan Xu,et al. The feasibility of launching and detecting jamming attacks in wireless networks , 2005, MobiHoc '05.

[29] Dan Rubenstein,et al. Using Channel Hopping to Increase 802.11 Resilience to Jamming Attacks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[30] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[31] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[32] Jeffrey H. Reed,et al. Defense against Primary User Emulation Attacks in Cognitive Radio Networks , 2008, IEEE Journal on Selected Areas in Communications.

[33] R. Fisher,et al. Limiting forms of the frequency distribution of the largest or smallest member of a sample , 1928, Mathematical Proceedings of the Cambridge Philosophical Society.