Rational and Convergent Learning in Stochastic Games

This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3]  J. Davenport Editor , 1960 .

[4]  William Vickrey,et al.  Counterspeculation, Auctions, And Competitive Sealed Tenders , 1961 .

[5]  V. Smith An Experimental Study of Competitive Market Behavior , 1962, Journal of Political Economy.

[6]  A. M. Fink,et al.  Equilibrium in a stochastic $n$-person game , 1964 .

[7]  Theodore Groves,et al.  Incentives in Teams , 1973 .

[8]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[9]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  V. Smith Microeconomic Systems as an Experimental Science , 1982 .

[12]  S. Rassenti,et al.  A Combinatorial Auction Mechanism for Airport Time Slot Allocation , 1982 .

[13]  Editors , 1986, Brain Research Bulletin.

[14]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[15]  James A. Hendler,et al.  Introduction: designing interfaces for expert systems , 1987 .

[16]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[17]  W. Buxton Human-Computer Interaction , 1988, Springer Berlin Heidelberg.

[18]  Edmund H. Durfee,et al.  Approximate Processing in Real-Time Problem Solving , 1988, AI Mag..

[19]  E. Rasmussen Games and Information , 1989 .

[20]  Roy Rada,et al.  Interacting WITH Computers , 1989, Interact. Comput..

[21]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[22]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[23]  Kristian J. Hammond,et al.  Case-Based Planning: A Framework for Planning from Experience , 1990, Cogn. Sci..

[24]  A. Rubinstein,et al.  Bargaining and Markets , 1991 .

[25]  W. Raub,et al.  Reputation and Efficiency in Social Interactions: An Example of Network Effects , 1990, American Journal of Sociology.

[26]  James A. Hendler,et al.  MERGING SEPARATELY GENERATED PLANS WITH RESTRICTED INTERACTIONS , 1992, Comput. Intell..

[27]  Joseph Y. Halpern,et al.  A Guide to Completeness and Complexity for Modal Logics of Knowledge and Belief , 1992, Artif. Intell..

[28]  R. McAfee,et al.  A dominant strategy double auction , 1992 .

[29]  Steffen L. Lauritzen,et al.  aHUGIN: A System Creating Adaptive Causal Probabilistic Networks , 1992, UAI.

[30]  Victor R. Lesser,et al.  A Generic Model for Intelligent Negotiating Agents , 1992, Int. J. Cooperative Inf. Syst..

[31]  David Zuckerman,et al.  Optimal Speedup of Las Vegas Algorithms , 1993, Inf. Process. Lett..

[32]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[33]  Timos K. Sellis,et al.  Improvements on a Heuristic Algorithm for Multiple-Query Optimization , 1994, Data Knowl. Eng..

[34]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[35]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[36]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[37]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[38]  David Pisinger,et al.  Algorithms for Knapsack Problems , 1995 .

[39]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[40]  Ronitt Rubinfeld,et al.  Efficient algorithms for learning to play repeated games against computationally bounded adversaries , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[41]  Hal R. Varian,et al.  Economic Mechanism Design for Computerized Agents , 1995, USENIX Workshop on Electronic Commerce.

[42]  Sarit Kraus,et al.  Multiagent Negotiation under Time Constraints , 1995, Artif. Intell..

[43]  Piotr J. Gmytrasiewicz A Review of Rules of Encounter: Designing Conventions for Automated Negotiation , 1995, AI Mag..

[44]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[45]  A. Mas-Colell,et al.  Microeconomic Theory , 1995 .

[46]  A. Treisman The binding problem , 1996, Current Opinion in Neurobiology.

[47]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[48]  Jeffrey S. Rosenschein,et al.  Mechanism Design for Automated Negotiation, and its Application to Task Oriented Domains , 1996, Artif. Intell..

[49]  Benjamin W. Wah,et al.  Editorial: Two Named to Editorial Board of IEEE Transactions on Knowledge and Data Engineering , 1996 .

[50]  Sandip Sen IJCAI-95 Workshop on Adaptation and Learning in Multiagent Systems , 1996 .

[51]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[52]  David Carmel,et al.  Learning Models of Intelligent Agents , 1996, AAAI/IAAI, Vol. 1.

[53]  Anand S. Rao,et al.  Distributed Storage of Replicated Beliefs to Facilitate Recovery of Distributed Intelligent Agents , 1997, ATAL.

[54]  Eric Horvitz,et al.  Models of Continual Computation , 1997, AAAI/IAAI.

[55]  Pattie Maes,et al.  Kasbah: An Agent Marketplace for Buying and Selling Goods , 1996, PAAM.

[56]  Neeraj Arora,et al.  LEARNING TO TAKE RISKS , 1992 .

[57]  E. Salas,et al.  A Framework for Developing Team Performance Measures in Training , 1997 .

[58]  Pablo Noriega,et al.  A Framework for Argumentation-Based Negotiation , 1997, ATAL.

[59]  Rina Azoulay-Schwartz,et al.  Bidding Mechanisms for Data Allocation in Multi-Agent Environments , 1997, ATAL.

[60]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[61]  Katia P. Sycara,et al.  Middle-Agents for the Internet , 1997, IJCAI.

[62]  Bart Selman,et al.  Boosting Combinatorial Search Through Randomization , 1998, AAAI/IAAI.

[63]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[64]  Michael P. Wellman,et al.  Flexible double auctions for electronic commerce: theory and implementation , 1998, Decis. Support Syst..

[65]  B. Srinivasan,et al.  A firm real-time system implementation using commercial off-the-shelf hardware and free software , 1998, Proceedings. Fourth IEEE Real-Time Technology and Applications Symposium (Cat. No.98TB100245).

[66]  Katia P. Sycara,et al.  Bayesian learning in negotiation , 1998, Int. J. Hum. Comput. Stud..

[67]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[68]  Michael P. Wellman,et al.  The Michigan Internet AuctionBot: a configurable auction server for human and software agents , 1998, AGENTS '98.

[69]  Sarit Kraus,et al.  Reaching Agreements Through Argumentation: A Logical Model and Implementation , 1998, Artif. Intell..

[70]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[71]  Somesh Jha,et al.  Strategies for Querying Information Agents , 1998, CIA.

[72]  Robert H. Guttman,et al.  Cooperative vs. Competitive Multi-Agent Negotiations in Retail Electronic Commerce , 1998, CIA.

[73]  Holger H. Hoos,et al.  Stochastic Local Search-Methods , 1998 .

[74]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[75]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[76]  Sandip Sen,et al.  Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..

[77]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[78]  Jeffrey O. Kephart,et al.  MailCat: an intelligent assistant for organizing e-mail , 1999, AGENTS '99.

[79]  Craig A. Knoblock,et al.  Selectively materializing data in mediators by analyzing user queries , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[80]  Liliana Ardissono,et al.  An agent architecture for personalized Web stores , 1999, AGENTS '99.

[81]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[82]  James T. Enns,et al.  Large Datasets at a Glance: Combining Textures and Colors in Scientific Visualization , 1999, IEEE Trans. Vis. Comput. Graph..

[83]  M. Nowak,et al.  Evolutionary game theory , 1995, Current Biology.

[84]  P. Klemperer Auction Theory: A Guide to the Literature , 1999 .

[85]  Makoto Yokoo,et al.  A Limitation of the Generalized Vickrey Auction in Electronic Commerce: Robustness against False-name Bids , 1999, AAAI/IAAI.

[86]  Toby Walsh,et al.  Proceedings of the 16th international joint conference on Artificial Intelligence - IJCAI '99 , 1999 .

[87]  G. Zacharia Collaborative reputation mechanisms for online communities , 1999 .

[88]  Wolfgang Benn,et al.  Enabling Integrative Negotiations by Adaptive Software Agents , 1999, CIA.

[89]  Yoav Shoham,et al.  Taming the Computational Complexity of Combinatorial Auctions: Optimal and Approximate Approaches , 1999, IJCAI.

[90]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[91]  Yoav Shoham,et al.  Towards a universal test suite for combinatorial auction algorithms , 2000, EC '00.

[92]  Katia P. Sycara,et al.  Agent interoperation across multiagent system boundaries , 2000, AGENTS '00.

[93]  Jeffrey O. Kephart,et al.  Dynamic pricing by software agents , 2000, Comput. Networks.

[94]  Michael P. Wellman,et al.  MarketSAT: An Extremely Decentralized (but Really Slow) Algorithm for Propositional Satisfiability , 2000, AAAI/IAAI.

[95]  Maria Gini,et al.  Proceedings of the fourth international conference on Autonomous agents , 2000 .

[96]  Marilyn A. Walker,et al.  Learning to Predict Problematic Situations in a Spoken Dialogue System: Experiments with How May I Help You? , 2000, ANLP.

[97]  Tuomas Sandholm,et al.  eMediator: A Next Generation Electronic Commerce Server , 1999, AGENTS '00.

[98]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[99]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[100]  Catholijn M. Jonker,et al.  Compositional design and reuse of a generic agent model , 2000, Appl. Artif. Intell..

[101]  Keith S. Decker,et al.  Tools for Developing and Monitoring Agents in Distributed Multi-Agent Systems , 2000, Agents Workshop on Infrastructure for Multi-Agent Systems.

[102]  Arne Andersson,et al.  Integer programming for combinatorial auction winner determination , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[103]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[104]  Makoto Yokoo,et al.  The effect of false-name declarations in mechanism design: towards collective decision making on the Internet , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[105]  Liliana Ardissono,et al.  Configurability within a multi-agent Web store shell , 2000, AGENTS '00.

[106]  Craig Boutilier,et al.  Solving Combinatorial Auctions Using Stochastic Local Search , 2000, AAAI/IAAI.

[107]  Michael P. Wellman,et al.  Combinatorial auctions for supply chain formation , 2000, EC '00.

[108]  Boi Faltings,et al.  A Multi-Agent Recommender System for Planning Meetings , 2000 .

[109]  Y. Shoham,et al.  Truth revelation in rapid, approximately efficient combinatorial auctions , 2001 .

[110]  Christian A. Müller,et al.  Recognizing Time Pressure and Cognitive Load on the Basis of Speech: An Experimental Study , 2001, User Modeling.

[111]  Manuela M. Veloso,et al.  Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[112]  Michael P. Wellman,et al.  Auction Protocols for Decentralized Scheduling , 2001, Games Econ. Behav..

[113]  Anthony Jameson,et al.  When actions have consequences: empirically based decision making for intelligent user interfaces , 2001, Knowl. Based Syst..

[114]  Makoto Yokoo,et al.  Robust double auction protocol against false-name bids , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[115]  John Dickhaut,et al.  Price Formation in Double Auctions , 2001, E-Commerce Agents.

[116]  Ho Soo Lee,et al.  Computational Aspects of Clearing Continuous Call Double Auctions with Assignment Constraints and Indivisible Demand , 2001, Electron. Commer. Res..

[117]  相場亮 Distributed Constraint Satisfaction: Foundations of Cooperation in Multi - Agent Systems , 2001 .

[118]  David Levine,et al.  Winner determination in combinatorial auction generalizations , 2002, AAMAS '02.

[119]  Tuomas Sandholm,et al.  Algorithm for optimal winner determination in combinatorial auctions , 2002, Artif. Intell..

[120]  Tuomas Sandholm eMediator: A Next Generation Electronic Commerce Server , 2002, Comput. Intell..

[121]  Jon Trinder,et al.  The Humane Interface: New Directions for Designing Interactive Systems , 2002, Interact. Learn. Environ..

[122]  Sarvapali D. Ramchurn,et al.  Argumentation-based negotiation , 2003, The Knowledge Engineering Review.

[123]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.