Distributed strategic learning with application to network security

We consider in this paper a class of two-player nonzero-sum stochastic games with incomplete information. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, she updates her strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts. Convergence to state-independent equilibria is analyzed under specific payoff functions. Results are applied to a class of security games in which the attacker and the defender adopt different learning schemes and update their strategies at random times.

[1]  Jeff S. Shamma,et al.  Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.

[2]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[3]  E. J. Collins,et al.  Convergent multiple-timescales reinforcement learning algorithms in normal form games , 2003 .

[4]  Tansu Alpcan,et al.  Network Security , 2010 .

[5]  Jason R. Marden,et al.  Joint Strategy Fictitious Play with Inertia for Potential Games , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[6]  W. Brian Arthur,et al.  On designing economic agents that behave like human agents , 1993 .

[7]  Miroslav Krstic,et al.  Nash equilibrium seeking with infinitely-many players , 2011, Proceedings of the 2011 American Control Conference.

[8]  Quanyan Zhu,et al.  Heterogeneous learning in zero-sum stochastic games with incomplete information , 2011, 49th IEEE Conference on Decision and Control (CDC).

[9]  S. M. Shahidehpour,et al.  Transaction analysis in deregulated power systems using game theory , 1997 .

[10]  Dirk Ifenthaler,et al.  Stochastic Models of Learning , 2012 .

[11]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[12]  M. Benaim,et al.  Stochastic approximation, cooperative dynamics and supermodular games , 2010, 1001.4871.

[13]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[14]  David S. Leslie,et al.  Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[15]  P. Holmes,et al.  Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields , 1983, Applied Mathematical Sciences.

[16]  H. Kushner Stochastic approximation: a survey , 2010 .