Learning through reinforcement for N-person repeated constrained games

The design and analysis of an adaptive strategy for N-person averaged constrained stochastic repeated game are addressed. Each player is modeled by a stochastic variable-structure learning automaton. Some constraints are imposed on some functions of the probabilities governing the selection of the player's actions. After each stage, the payoff to each player as well as the constraints are random variables. No information concerning the parameters of the game is a priori available. The "diagonal concavity" conditions are assumed to be fulfilled to guarantee the existence and uniqueness of the Nash equilibrium. The suggested adaptive strategy which uses only the current realizations (outcomes and constraints) of the game is based on the Bush-Mosteller reinforcement scheme in connection with a normalization procedure. The Lagrange multipliers approach with a regularization is used. The asymptotic properties of this algorithm are analyzed. Simulation results illustrate the feasibility and the performance of this adaptive strategy.

[1]  Xiaohong Chen,et al.  Nonparametric Adaptive Learning with Feedback , 1998 .

[2]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[3]  Kaddour Najim,et al.  Bush‐Mosteller learning for a zero-sum repeated game with random pay-offs , 2001, Int. J. Syst. Sci..

[4]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[5]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[6]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[7]  Kaddour Najim,et al.  Learning automata and stochastic optimization , 1997 .

[8]  Y. M. El-Fattah,et al.  Learning Systems: Decision, Simulation, and Control , 1978 .

[9]  S. Lakshmivarahan,et al.  Learning Algorithms Theory and Applications , 1981 .

[10]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  T. E. S. Raghavan,et al.  Algorithms for stochastic games — A survey , 1991, ZOR Methods Model. Oper. Res..

[13]  Kaddour Najim,et al.  Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs , 2001, Autom..

[14]  J. Goodman Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[15]  Nahum Shimkin,et al.  Stochastic Games with Average Cost Constraints , 1994 .

[16]  Alexander S. Poznyak,et al.  Self-Learning Control of Finite Markov Chains , 2000 .

[17]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[18]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[19]  H. Robbins,et al.  A CONVERGENCE THEOREM FOR NON NEGATIVE ALMOST SUPERMARTINGALES AND SOME APPLICATIONS**Research supported by NIH Grant 5-R01-GM-16895-03 and ONR Grant N00014-67-A-0108-0018. , 1971 .

[20]  J. Aubin Mathematical methods of game and economic theory , 1979 .

[21]  V. V. Phansalkar,et al.  Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[22]  N. Baba New Topics in Learning Automata Theory and Applications , 1985 .

[23]  Daniel Friedman,et al.  Evolutionary economics goes mainstream: A review of the theory of learning in games , 1998 .

[24]  N. M. Alemdar,et al.  Learning the optimum as a Nash equilibrium , 2000 .

[25]  L. Hurwicz,et al.  Constraint Qualifications in Maximization Problems , 1961 .

[26]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[27]  Kaddour Najim,et al.  Multimodal searching technique based on learning automata with continuous input and changing number of actions , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[28]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[29]  H. White,et al.  ADAPTIVE LEARNING WITH NONLINEAR DYNAMICS DRIVEN BY DEPENDENT PROCESSES , 1994 .

[30]  Kaddour Najim,et al.  Learning Automata: Theory and Applications , 1994 .

[31]  Jayasri Dutta,et al.  Learning by Observation within the Firm , 1996 .

[32]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[33]  Alvin E. Roth,et al.  Modelling Predicting How People Play Games: Reinforcement learning in experimental games with unique , 1998 .

[34]  Radu Theodorescu Review: M. Frank Norman, Markov Processes and Learning Models , 1974 .

[35]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[36]  Yves Balasko,et al.  Stability of Competitive Equilibrium with Respect to Recursive and Learning Processes , 1996 .

[37]  Jasmina Arifovic Genetic algorithm learning and the cobweb model , 1994 .

[38]  Stephen S. Wilson,et al.  Random iterative models , 1996 .

[39]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[40]  Chris Cannings,et al.  Stochastic Games and Related Topics , 1991 .